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ABSTRACT 


'  This  thesis  presents  and  evalnates  variants  of  the  ellipsoid 
algorithm  for  nonlinear  programming.  Two  primary  types  of  variants 
are  examined:  different  deep  cut  hyperplanes,  and  different 
strategies  for  testing  the  feasibility  constraints.  Five  of  the 
deep  cut  variants  do  not  require  a  linesearch,  while  three  do 
require  a  linesearch.  Of  the  five  constraint  examination  methods. 


three  are 

alternative 

ways 

of  examining 

the  full  list 

of 

feasibility 

constraints . 

The 

fourth  method 

is  an  active 

set 

strategy,  while  the  fifth  uses  a  record  objective  function  value 
constraint.  The  variants  are  tested  on  13  general  and  geometric 
programming  problems,  both  convex  and  nonconvex.  The  performance 
of  each  variant  is  measured  in  combined  solution  error  as  a 
function  of  solution  time.  The  experimental  results  show  that  the 
lowest  solution  error  achieved  is  essentially  unaffected  by  the 
constraint  examination  strategy.  However,  all  but  one  of  the  deep 
cut  variants  occasionally  converge  to  non-optimal  points  if  the 

problem  is  nonconvex.  Three  of  the  deep  cuts  and  three  of  the 

I* 

constraint  examination  strategies  are  shown  to  improve  the 
efficiency  of  the  algorithm.  The  most  efficient  variant  was  the 
active  set  strategy,  which  attained  almost  all  of  the  efficiency 
improvement  that  is  theoretically  possible  for  any  active  set 


strategy 


PART  1 


INTRODUCTION  AND  HISTORICAL  REVIEW 

In  [11] ,  Shor  describes  a  simple  method  for  solving  convex 
programming  problems,  which  has  since  become  known  as  the  ellipsoid 
algorithm  (EA) .  In  [6]  and  elsewhere,  Ecker  and  Kupferschmid  show 
that  the  EA  can  be  practically  applied  to  both  convex  and  nonconvex 
nonlinear  programming  problems,  and  that  it  is  more  robust  than 
some  other  methods  in  common  use.  On  many  problems  the  EA  is  also 
competitive  with  other  methods  in  terms  of  efficiency,  for  at  least 
some  levels  of  solution  error. 

Consider  the  nonlinear  programming  problem 

NLP:  min  f  . , (x) 
m+1 

x  €  Rn 

subject  to  x  6  S  =  {x  I  f^x)  <.0,  i  *  l...m). 

* 

Assume  that  there  exists  an  optimal  point  x  solving  the 
problem  NLP,  that  an  initial  n-dimensional  ellipsoid  E-  can  be 
found  with  x  €  E_,  and  that  the  intersection  of  Eq  with  the 
feasible  region  S  has  computationally  positive  volume  relative  to 
Rn.  The  EA  generates  a  sequence  of  successively  smaller  ellipsoids 

Ek  ■  (x  I  (x  -  xk)TQ^1(x  -  xk)  1  1} 


1 


each  containing  z 


We  use  only  rank  1  updates,  although  there  are 


rank  2  updates  as  well  [5].  Given  an  ellipsoid  in  this 

sequence,  a  cut  hvoerolane 


Hk  =  {x  I  -(g°)T(x  -  xk)  =  0} 


c 

is  constructed  passing  through  a  cut  point  x  with  cut  gradient 
c 

g  .  We  define  one  of  its  half-spaces 


=  (x  I  — (gC )T(x  -  xk)  l  0} 


as  the  cut  half space .  We  can  construct  cuts  which,  assuming 

convexity  of  f^...fn+j,  ensure  that  the  intersection  of  with 

*  + 

contains  x  .  We  call  a  cut  a  deep  cut  if 


half  of  E. 


contains  less  than 
The  construction  of  such  cuts,  and  their  affect  on  EA 


convergence,  is  presented  in  Part  3. 


Given  the  hyperplane  H^,  the  center  x  of  Ek+1  and  the 
positive  definite  matrix  used  in  defining  E^+^  are  determined 

by  the  simple  update  formulae 

k+1  k  .  ,c  , 
x  =  x  +  ad  and 

°k+l  =  b(Qk  "  cdC(dC)T) 


3 

d  =  -Q^g  / ( ( g  )  Q^tg  ))  , 

a  *  (1  +  na)/(n  +  1) , 

and,  for  n>l,  b  =  n2(l  -  a2)/(n2  -  1) 
and  c  =  2a/ (1  +  a) . 

The  depth  o£  cnt,  a,  is  calculated  as 

_  .  c.T.  k  c. ...  c .T  c.1/2 

a  =  (g  )  (x  -  x  )/((g  )  CL  g  ) 

c 

Geometrically,  d  is  a  vector  from  the  center  of  to  a  point  on 

c  c  k  c  c 

the  ellipsoid  boundary  y  (y  =  x  +  d  )  such  that  y  is  the 
minimizing  point  of  the  problem 

.  c.T 

Min  (g  )  x,  x  €  Ej,. 

The  parameter  a  is  the  ratio  of  the  distance  along  d^  from  the 

c 

centerpoint  to  H^,  divided  by  the  length  of  d  .  The  convergence  of 
the  EA  depends  on  the  depth  of  cut,  since  the  ratio  of  consecutive 
ellipsid  volumes  (see  Bland,  Goldfarb,  and  Todd  [1])  is 

((n2(l  -  a2))/(n2  -  l))((n  "  1)/2)(n(l  -  a)/(n  +  D) 

This  ratio  is  less  than  one  for  -(1/n)  <  a  <  1. 

Previous  research  ([12])  suggested  that  normalizing  the 
gradients  increased  the  stability  of  the  algorithm.  We  represent 
the  normalized  gradient  as  g.(x)  =  Vf  .  (x)/ il  Vf  (x)  II  . 


For 


efficiency  we  use  the  infinity  norm,  so  that  the  element  of  g 
largest  in  absolute  value  has  absolute  value  1. 

The  cut  point  and  cut  gradient  are  selected  after  determining 

which  function  f,...f  is  tc  be  used  when  constructing  H.  .  Part 
1  m+1  k 

4  of  this  thesis  will  examine  five  strategies  for  determining 

this  function.  Unless  otherwise  mentioned,  our  method  is  to  first 

k 

examine  the  m  feasibility  constraints,  to  determine  whether  x  €  S. 
1c 

If  x  €  S',  then  a  function  f.  has  been  found  with  f.  >0,  and  that 

l  i 

function  is  used  when  constructing  H^.  If  x^  £  S,  then  f  is 
used  when  constructing 

To  determine  if  xk€  S  in  the  above  process,  we  select  an 
examination  order  for  the  m  feasibility  constraints.  Unless 
otherwise  specified,  we  use  the  cyclical  ordering  of  4.1.2. 

In  the  EA  implemented,  if  x^  €  S  we  evaluate  fm+^(x*)  to  keep 
a  record  of  the  best  feasible  point  found  so  far.  This  best  point 
xr  is  called  the  record  point .  and  the  corresponding  objective 
function  value  fr  =  fm+^(xr)  is  called  the  record  value .  Until  a 
record  point  is  found,  we  let  fr  =  +<»  .  We  also  define 

G  =  {x  I  fn+1(x)  i.  G  is  the  lowest  objective  function  level 

* 

set  known  to  contain  x  ,  and  G  changes  whenever  a  new  record  point 
is  found.  Thus,  S  O  G  contains  all  feasible  points  with  objective 
function  values  at  or  below  fr.  Therefore  x  €  (SO  G),  and  if  a 
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record  point  xr  has  been  found  then  xr  €  (S  f  G) .  The  limit  point 

r  • 

of  the  sequence  of  record  points  x  is  the  optimal  point  x  .  We 
call  S  fl  G  the  solotion-containina  for  NLP. 

When  is  used  in  constructing  H  ,  we  call  the  resulting 

cut  an  ontimalitv  cut ;  otherwise,  the  cut  is  a  feasibility  cut .  We 

refer  to  the  point  x^  as  a  Phase  1.  point  or  a  Phase  2  point,  and  to 

iteration  k  as  a  Phase  1  iteration  or  a  Phase  2  iteration,  based  on 
k  k 

whether  x  €  S'  or  x  €  S.  At  each  iteration,  by  the  time  that  we 

know  which  function  will  be  used  for  H^,  and  before  we  must 
c  c 

determine  x  and  g  ,  the  constraint  selection  strategy  above  will 
have  classified  x^  as  being  in  S',  or  in  S  fl  G',  or  in  S  fl  G. 


The  ellipsoid  update  formulae 
D  E^)  C  .  To  ensure  that  x  e 
with  certain  properties.  For  each  i  - 


presented  above  ensure  that 
Ej.+^,  we  must  select  cuts 
1 . . .m,  let 


S.  =  {x  I  f .(x)  <  0} 

l  l 

be  the  feasible  region  for  the  constraint  f.(x)  <  0. 

i  — 

Consider  a  set  C  C  R&,  a  subset  of  which  may  be  contained  in 
E^.  If  the  cut  hyperplane  is  constructed  so  that  C  C  H*,  then 

(C  0  E^)  C  E^+^  because  (H*  fl  E^)  C  Ej_+^ .  We  say  such  a  cut 


preserves  the  set  C. 


and 


The  EA  described  above  uses  a  feasibility  cut  if  z  €  S', 

an  optimality  cut  if  xk  €  S.  If  fj...fn+^  are  convex  then  these 

* 

cuts  preserve  x  and  we  term  these  cuts  solution-preserving .  A 
feasibility  cut  on  the  (violated)  i-th  constraint  is 

solution-preserving  if  H+  because  then 

*  + 

x  e  s  c  s .  c  h.  . 

i  k 

An  optimality  cut  is  also  solution-preserving  if  G  C  because 
then 

x*  €  (S  D  G)  C  G  C  H*. 

Thus,  our  various  feasibility  cuts  should  preserve  and  our 
optimality  cuts  should  preserve  G  to  be  solution-preserving. 

The  center  cut  strategy  used  by  Shor  [151  and  by  Ecker  and 

c  k  c  k 

Kupferschmid  [6]  uses  x  -  x  ,  and  g  =  g^(x  ),  where  i  is  the 
index  of  the  function  used  to  create  H^.  These  cuts  can  be  shown 
to  be  solution-preserving  by  use  of  the  support  inequality  at  xk. 
For  a  feasibility  cut,  actually  supports 

L  -  (x  |  f ,(x)  <  f,(xk)}. 
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and  C  since  f^(x*)  2  0.  For  an  optimality  cut,  actually 
supports 


Lm+1  =  {I  1  fm+l<x)  1  fm+l(xk))' 


and  GCLi(  since  f  ^.(x*)  >  f *. 
m+1  m+1  ~ 


All  but  two  of  the  deep  cut  variants  also  select  to  be  a 
support  hyperplane  to  a  level  set  which  contains  the  level  set 
to  be  preserved. 


For  Part  3  where  deep  cuts  are  examined,  we  attempt  to  make 

the  deepest  cut  possible,  to  gain  the  greatest  ellipsoid  volume 

reduction.  It  can  be  seen  that  for  a  center  cut  a  »  0,  and  for  a 

deep  cut  a  >  0.  If  a  deep  cut  algorithm  results  in  a  <  0,  we  would 

instead  use  the  center  cut  above.  The  equation  for  a  shows  that 

c  1c 

a  >  0  requires  that  x  j*  x  and  that  the  directional  derivative 
c  ]c 

of  f^  along  x  -  x  must  be  negative. 

Another  consideration  for  deep  cut  variants  is  to  ensure  that 
o  <  1  for  the  ellipsoid  update  formulae.  If  xC  €  then  I  a  I  2  1; 

otherwise,  a  may  exceed  these  bounds.  If  a  =  1  then  consists 

solely  of  the  point  yC.  If  a  >  1  then  (S  fl  G)  fl  =  0,  indicating 
that  x  does  not  lie  in  E^.  When  this  occurs  after  a  record  point 
has  been  found,  it  is  taken  as  evidence  of  numerical  roundoff  error 


ia  the  calculation  of  a,  or  that  is  no  longer  positive  definite. 
If  a  variant  calculates  a  2,  1,  ve  make  a  center  cat  to  allow  the 
algorithm  to  continue. 

The  progress  of  an  EA  using  center  cuts  is  illustrated 
graphically,  for  a  hypothetical  problem  having  n  -  2,  in  [12]. 


PART  2 


EXPERIMENTAL  METHODS 


2.1  Test  Problems 

The  test  problems  chosen  for  this  study  consist  of  S  general 
nonlinear  problems  and  8  geometric  programming  problems,  of  which  3 
are  convex  and  10  are  nonconvex.  Table  2.1  summarizes  the  13 
problems. 


Table  2.1  Test  Problems:  General  Information 


^ - 

1  problem 

-  1 

1 

convex? 

-t— - 

1  n 

4- 

1 

m 

*T 

1 

1  Colville 

1  1 

yes 

1  3 

1 

15 

1 

1  Colville 

2  I 

no 

1  15 

1 

20 

1 

1  Colville 

3  1 

no 

1  5 

1 

16 

1 

1  Colville 

4  1 

no 

1  4 

1 

8 

1 

1  Colville 

8  1 

no 

1  3 

1 

20 

1 

1  Dembo  lb 

1 

yes 

1  12 

1 

3 

1 

1  Dembo  2 

1 

no 

1  S 

1 

9 

1 

1  Dembo  3 

1 

no 

1  7 

1 

15 

1 

1  Dembo  4a 

1 

no 

1  8 

1 

4 

1 

I  Dembo  5 

1 

no 

1  8 

1 

6 

1 

1  Dembo  6 

1 

no 

1  13 

1 

18 

1 

1  Dembo  7 

1 

no 

1  16 

1 

25 

1 

1  Dembo  8a 

1 

yes 

1  7 

t 

1 

-4- 

4 

1 

— i. 

Notes: 

..  For  statements  of  the  Colville  problems,  see  [2];  for 
statements  of  the  Dembo  problems,  see  [3].  The 
constraints  are  indexed  here  the  same  as  in  [2]  and  [3]. 


Colville  8  involves  functions  for  which  analytical 
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derivatives  cannot  be  given,  so  finite  differencing  was 
used  to  approximate  the  gradients  of  those  functions. 

..  The  statement  of  Dembo  2  in  [3]  is  imprecise  and  contains 
some  typographical  errors;  see  [12]  for  a  correct  problem 
statement . 

..  In  Dembo  3,  Dembo  6,  and  Dembo  7,  some  of  the  constraints 
are  explicit  bounds  on  variables. 

In  order  to  guarantee  that  each  strategy  solves  the  same 
problems,  the  data  necessary  to  define  a  particular  problem  is 
given  in  a  single  data  structure  accessed  by  each  of  the 
strategies . 

For  each  test  problem,  we  use  the  vectors  from  [2]  and  [3] 

of  upper  and  lower  bounds  xh  and  x1  on  the  variables.  The  starting 
0 

point  x  is  chosen  as  the  midpoint  of  these  bounds.  The  ellipsoid 
algorithm  requires  that  an  initial  ellipsoid  EQ  be  given  which 
contains  the  optimal  point,  and  we  select  as  the  ellipsoid  of 

minimum  volume  containing 

(x  I  x1  <  x  1  xh} . 


Given  a  test  problem  and  a  starting  point,  each  EA  variant  is 
allowed  to  run  until  it  has  obtained  the  best  solution  of  which  it 
is  capable.  At  each  iteration,  the  current  centerpoint  xk,  the 
current  objective  function  value  fo+^(xk),  and  the  computer  time 

used  so  far  are  written  in  a  file.  After  the  experiment  is  over, 

these  performance  measurements  are  used  to  analyze  the  behavior  of 
the  EA  variant  that  was  used. 

After  Eason  and  Fenton,  [4],  we  use  error  versus  effort 

curves  to  display  the  convergence  trajectories  of  the  various  EA 
variants.  The  error  measure  that  we  use  here  is  computed  as 

follows.  First,  the  combined  error  measure 

m 

e(xk)  =  If  .,(xk)  -  f  .(x  )|  +  5  A.lf.(xk)l 

m+1  m+1  .  ,  1  i 

i=l 

k 

is  cosqtuted  for  each  iterate  x  in  the  solution  process,  where  the 
are  the  Lagrange  multipliers  at  optimality.  These  values  are 
then  normalized  to  obtain  the  relative  combined  error  measure 

E(xk)  =  e(xk)/e(x°) , 

where  x^  denotes  the  common  starting  point  of  the  variants.  The 


common  logs  of  the  E(x  )  are  plotted  versus  the  measure  of  effort 
used  so  far.  Details  regarding  the  calculation  of  the  optimal 
Lagrange  multipliers  are  given  in  [8]. 

The  measure  of  effort  used  for  the  error  curves  here  is  the 
problem  state  central  processing  unit  (PSCPU)  time  used  by  the  EA 
variant.  In  2.2.2  below  and  in  [12]  and  [6],  complete  details  are 
given  regarding  the  determination  of  PSCPU  time  used  by  a  strategy 
in  the  solution  process.  In  summary,  we  turn  a  timer  on  and  off  so 
as  to  measure  only  the  effort  nsed  in  performing  the  steps  of  the 
algorithm,  thereby  excluding  from  the  measurements  any  time  used 
for  input  and  output  operations,  for  other  tasks  performed  only  as 
conveniences  to  the  experimenter,  and  for  the  performance 
measurement  process  itself.  Extensive  experiments,  see  [12],  have 
shown  that  our  method  of  measuring  PSCPU  time  is  accurate, 
reproducible,  and  substantially  uncontaminated  by  system-load 
effects  and  other  influences  external  to  the  experiments. 

The  construction  of  meaningful  error  curves  using  the  process 
described  above  requires  the  optimal  solution  to  be  known  to 

considerably  more  precision  than  is  usually  reported  in  the 

* 

literature.  We  therefore  use  very  accurate  solutions  x  in  the 
construction  of  the  error  curves.  These  solutions  are  also  chosen 
to  be  strictly  feasible;  see  [13]  for  exact  problem  statements  and 
the  b«st  strictly  feasible  point  known  to  us  for  each  of  the  test 


problems  used. 


To  summarize  the  information  contained  in  the  error  carves,  we 
provide  tables  showing  the  solution  accuracy  and  algorithm 
efficiency  achieved  using  each  EA  variant,  for  each  problem.  The 
measure  of  accuracy  is  the  common  log  of  the  smallest  relative 
combined  solution  error  E(x^)  attained  using  the  variant. 

As  an  absolute  measure  of  efficiency,  it  is  possible  to  table 
the  PSCPU  time  needed  for  each  EA  variant  to  reduce  the  solution 
error  to  various  levels.  Ve  instead  report  a  measure  of  relative 
efficiency  that  compares  efficiency  over  all  error  levels 
attained.  This  is  possible  when  comparing  variants  of  the  ellipsoid 
algorithm  to  one  another  because  the  error  versus  effort  curves  for 
a  given  problem  usually  are  qualitatively  similar  in  appearance. 
They  differ  primarily  by  a  scale  factor  in  the  effort  values  (and 
occasionally  in  the  lowest  level  of  solution  error  attained). 
Figures  2.1  and  2.2  are  included  here  to  demonstrate  this 
similarity  in  shape,  and  the  derivation  of  our  measure  of  relative 
efficiency. 

Thus,  on  a  given  problem,  the  times  required  by  two  EA 

variants  to  attain  each  error  level  are  in  roughly  a  constant 

ratio.  We  model  the  times  as  t ..  =  st .  ,  where  t.  is  the  time 

jb  ja  jq 

required  by  variant  Q  to  reach  error  level  j. 


The  effort  scale 


00*02- 


i 


c.n 

Vdi  idii  L  o 

1  1 

- 1_ 

1  i 

- 1 - _l - 

t  - 

- 1 

-1 - - - 1 - - - 4 - 1 — i - 1 

3.00  6.00  9.00  12.00  15.00 


Total  PSCPU  Time  (sec) 


Figure  2.1  Relative  Efficiency 


we  estimate  the 


Given  error  levels  [log  E(x.)].,  j  =  1...J  , 

k  j  q 

relative  efficiency  s  by  the  regression 


min  1 
s  R 


1 


j=l 


where  J  =  min  (J  ,J.) . 

a  b 


We  use  [log  E(x^)].  values  of  0,  -1/6,  -2/6,..., -J  /6,  with  J  the 

J  q  q 

largest  integer  such  that 


-J  /6  <  min  log  E(x  ) 
q  "  k 


For  example,  on  Figure  2.1  variant  A  attained  a  lowest  log  error 
level  of  -15.26  and  variant  B  attained  a  lowest  level  of  -16.17. 
Thus  J  =  91  error  levels  from  0  to  -15.17  were  compared.  The 
best-fit  value  of  s  calculated  by  the  regression  on  this  problem 
was  0.69. 


The  error  versus  effort  curves  in  Figure  2.2  again  demonstrate 
the  similarity  of  shape  between  EA  variants  on  a  given  problem. 
The  relative  efficiencies  calculated  for  variants  B  and  C  were  0.82 
and  1.03  respectively.  Figure  2.2  also  shows  that  sometimes  the 
curves  for  different  variants  overlap  to  an  extent  that  they  are 
almost  indistinguishable.  For  this  reason,  the  order  in  which  the 


variants  are  labelled  from  top  to  bottom  on  the  page  corresponds  to 
the  curves  which  are  the  highest  to  the  lowest  on  the  page  (arrows 


L8 


from  the  labels  are  also  provided  when  they  can  be  used  to 
eliminate  confusion  between  nearly-overlapping  curves).  On  Figure 
2.2  the  curves  for  variants  A  and  C  almost  overlap,  but  variant  C 
is  the  curve  which  is  slightly  higher  on  the  page  in  the  -12  to  -15 
error  range . 

In  each  section  below,  the  variant  most  similar  to  that 
implemented  in  the  routine  EA3  of  Kupferschmid,  Nairn,  and  Ecker, 
[10],  is  chosen  as  the  standard  variant  A  against  which  the  other 
variants  are  compared.  The  resulting  regression  problems  are 
solved  using  bisection  (that  is,  EA  with  n  =  1  and  m  -  0) .  Since 
the  repeatability  of  the  PSCPU  time  measurements  is  about  +  2%,  we 
consider  relative  efficiencies  s  in  the  range  [.98,1.02]  not  to  be 
significant  in  the  comparison  of  two  variants,  and  s  values 
outside  that  range  to  be  presumptive  evidence  for  the  superiority 
of  one  variant  over  the  other. 

Note  that  this  technique  compares  relative  efficiency  of  the 
EA  variants  during  the  majority  of  the  convergence  trajectory,  when 
the  error  log  E(x*)  is  decreasing.  Although  some  EA  convergence 
trajectories  finally  depart  from  this  decrease,  this  does  not  much 
affect  the  results  of  the  relative  efficiency  technique. 


In  addition,  this  technique  does  not  assume  that  the  error 
versus  effort  curves  are  nearly  linear,  even  though  this  is  often 


the  case.  When  the  error  versns  effort  carves  are  nonlinear 

(perhaps  consisting  of  two  near-linear  segments),  the  model  is 
still  appropriate  because  the  variants  still  differ  primarily  by 
the  single  scale  factor  along  the  effort  valnes.  Appendix  D 

contains  the  complete  set  of  error  versus  effort  figures  for  the 
variants  and  problems  considered. 


The  experimental  software  was  designed  to  collect  a  large 
number  of  statistics  which  might  yield  farther  information  about  EA 
behavior.  Much  of  the  information  desired  was  easily  available  by 
inserting  counting  or  averaging  statistics  at  apppropriate  points 
in  the  code  path.  Some  primary  statistics  of  this  type  that  were 
collected  are: 

1.  Count  of  function  evaluations  performed. 

2.  Count  of  centerpoints  xk  by  regions  S'  versus 
S,  and  G*  versus  G. 

3.  History  of  functions  f^...f  ^  used  to  create 

V 

4.  Function  values  for  violated  constraints  and 
depths  of  cut  achieved,  for  feasibility  versus 
optiawlity  cuts. 

5.  Counts  of  successful  and  unsuccessful  cuts,  and 
depths  of  cut  achieved,  for  each  deep  cut 
variant . 

The  information  provided  by  these  statistics  was  invaluable  in 
validating  the  operation  of  the  code  segments,  analyzing  how  the 
algorithm  was  behaving,  and  suggesting  areas  for  possible 
improvements.  For  example,  item  3  above  suggested  that  the  active 
set  strategy  of  4.2  should  be  investigated. 


2.3_.2  Timing  Subrontine 

The  effort  measurement  process  devised  by  Kupferschmid  [12] 
allows  the  software  effort  expended  to  be  categorized  as  being  for 
algorithm  purposes  (such  as  updating  the  ellipsoid),  for 
convenience  purposes  (such  as  maintaining  the  statistics  above), 
and  for  timer  purposes  (the  actual  calls  to  the  timing  code).  The 
three  time  categories  are  called  TALG,  TOTHER ,  and  TREC 
respectively.  The  timer  subroutine  itself  is  called  RECORD.  The 
process  uses  a  global  timer  on  and  off  switch  to  indicate  whether 
or  not  the  PSCPU  time  should  be  considered  as  algorithm  time. 
However,  the  process  did  not  allow  algorithm  time  to  be  separated 
into  categories  such  as  time  for  finding  a  violated  constraint, 
calculating  the  cut  point,  updating  the  ellipsoid,  etc.  Since  we 
hoped  to  improve  the  efficiency  of  the  algorithm  by  reducing  the 
effort  required  to  perform  various  steps,  the  knowledge  of  the  time 
spent  by  the  algorithm  in  these  various  areas  was  important. 


Briefly,  RECORD  could  be  called  for  five  purposes 
(initialization,  start,  stop,  update,  and  finish).  Within  RECORD, 
every  call  included  one  call  to  the  PSCPU  clock  counter 
(4,096,000,000  counts  per  second).  Internal  to  RECORD,  counts  were 
accumulated  in  integer  counterparts  (ALGCNT,0THCNT,RECCNT)  to  the 
floating  point  time  values  TALG,  TOTHER,  and  TREC. 


An  initialization  call  would  initialize  the  storage  area  and 
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get  the  current  PSCPU  clock  count.  A  subsequent  start  call  would 
get  the  current  PSCPU  clock  count,  and  accumulate  the  elapsed  count 
from  the  previous  call  into  OTHCNT.  Later,  a  stop  call  would  get 
the  currrent  PSCPU  clock  count,  and  accumulate  the  elapsed  count 
from  the  previous  start  call  into  ALGCNT.  As  desired,  update  calls 
could  be  made  to  convert  the  cumulative  PSCPU  clock  counts  into 
PSCPU  times,  and  apply  the  correction  factors.  Finally,  a  finish 
call  could  be  made  to  signify  the  end  of  timing  operations  and  to 
calculate  the  final  time  statistics. 

RECORD  had  five  calibration  factors  which  were  used  so  that 
the  final  time  statistics  did  not  include  PSCPU  time  in  the  wrong 
categories.  The  count  difference  between  every  pair  of  RECORD 
calls  had  some  PSCPU  counts  which  would  get  reported  as  algorithm 
or  convenience  counts,  but  really  were  spent  within  RECORD  (while 
RECORD  was  called,  performed  the  operation  it  was  called  for,  and 
returned).  Five  different  parameters  were  required  because  the 
operations  performed,  and  thus  the  PSCPU  counts  used,  were 
different  for  the  various  calls.  Sequential  or  simultaneous 
calibration  of  the  five  parameters  was  a  very  lengthy  process;  thus 
adding  the  subcategorization  capability  to  the  existing  RECORD 
would  be  difficult. 


Instead  we  designed  a  much  simpler  RECORD  with  improved 
accuracy,  and  only  a  single  calibration  parameter.  This  new  RECORD 


was  then  given  the  sabcategorization  capability.  The  simplicity  of 
the  new  RECORD  is  achieved  by  directly  timing  all  variable  length 
code  paths. 

The  new  RECORD  first  calls  the  PSCPU  clock  counter,  then 
performs  the  variable  code  paths,  and  finally  calls  the  PSCPU  clock 
counter  a  second  time.  Within  RECORD,  the  code  paths  before  the 
first  call  to  the  PSCPU  counter  and  after  the  call  to  the  second 
PSCPU  counter  are  invariant.  The  count  difference  between  the 
second  and  the  first  PSCPU  counts  within  a  call  to  RECORD  directly 
measures  the  variable  code  path,  and  is  added  to  RECCNT.  The 
difference  between  the  first  PSCPU  count  of  a  given  RECORD  call  and 
the  second  PSCPU  count  of  the  previous  RECORD  call  is  added  to 
either  ALGCNT  or  OTHCNT,  as  before.  A  single  calibration  factor 
CALCNT  estimates  the  PSCPU  counts  in  this  interval  which  were  spent 
within  RECORD.  This  calibration  factor  is  then  added  to  RECCNT  and 
subtracted  from  the  ALGCNT  or  OTHCNT  that  was  just  incremented. 

In  addition  to  reducing  the  calibration  parameters  from  five 
to  one,  the  new  version  of  RECORD  also  maintains  the  parameter  as  a 
integer  count  rather  than  as  a  floating-point  time  constant. 
Further,  it  eliminates  th*>  necessity  for  initializing,  updating, 
and  finishing  calls. 

The  diagram  below  shows  how  the  new  RECORD  calculates  TALG, 


TOTHER,  and  TREC.  Cl  through  C4  represent  increasing  values  of  the 
PSCPU  counter. 


(  A  start  call 
(  to  RECORD 
( 

—  <-h - (- 

Cl  C2 

Notes: 

..  On  the  first  call  to  RECORD,  all  variables  are 
initialized.  Assuming  that  the  timer  mechanism  has  not 
been  in  use,  the  time  before  Cl  is  ignored. 

..  In  the  time  after  C2  but  before  returning  from  RECORD, 
C2  -  Cl  is  added  to  RECCNT.  TREC  is  updated  from  RECCNT. 

..  After  C3  and  before  C4,  C3  -  C2  is  added  to  ALGCNT.  CALCNT 
is  subtracted  from  ALGCNT,  and  CALCNT  is  added  to  RECCNT. 
TALG  is  updated  from  ALGCNT. 

. .  After  C4  but  before  returning  from  RECORD,  C4  -  C3  is  added 
to  RECCNT,  and  TREC  is  updated  from  RECCNT. 


A  similar  diagram  explains  the  simple  calibration  process. 
For  the  calibration,  we  use  CALCNT  =  0  so  no  corrections  are 
subtracted  from  ALGCNT.  If  we  perform  a  a  start  call  followed  by  a 
stop  call  with  no  intervening  operations,  then  C3  -  C2  is  reported 
as  ALGCNT.  This  value  of  ALGCNT  represents  the  counts  used  by  the 
invariant  entry  and  exit  portions  of  the  RECORD  code.  Thus,  this 


)( 

)( 

)( 

•)(■ 


Algorithm 
steps  being 
timed 


)( 

)( 

)( 

->(- 


A  stop  call 
to  RECORD 


C3 


--i — 

C4 


value  of  ALGCNT  is  the  value  desired  for  CALCNT. 


seconds).  When  this  constant  is  used  in  the  model  and  the 

start-stop  pair  process  is  repeated,  the  ALGCNT  value  (the  nnll) 

should  be  near  zero  since  no  algorithm  steps  are  performed  between 

C2  and  C3.  The  nulls  of  the  new  RECORD  were  compared  with  the 

old  nulls.  The  results  demonstrated  that  nulls  of  the  new  version 

had  the  same  degree  of  variability  as  the  old  nulls,  approximately 

10  **  seconds  per  pair.  However,  the  means  of  the  new  nulls  were 

centered  at  zero,  while  the  means  of  the  old  nulls  centered  near 
-7 

7x10  seconds  per  pair. 

Also,  we  tested  the  EA  using  the  new  timer  on  the  test 

problems,  to  see  how  much  TALG  varied.  The  algorithm  times  were 


consistently  within  2  percent  of  each  other,  both  when  runs  under 


the  new  RECORD  were  compared  with  each  other,  and  when  runs 
under  the  new  RECORD  were  compared  with  runs  under  the  old  RECORD. 
During  the  sequence  of  experiments  reported  here,  we  further 
attempted  to  minimize  the  effect  of  extraneous  factors  by  running 
the  experiments  between  midnight  and  6  am,  when  the  system 
workload  was  at  a  minimum. 

A  disadvantage  to  using  the  new  RECORD  was  that  it  increased 
TREC  for  each  run,  and  thus  slightly  increased  the  cost  of  each 
run.  TREC  increased  for  two  reasons.  First,  each  call  to  RECORD 
was  more  expensive,  because  the  count-to-time  conversion  was  being 
done  at  each  call.  Second,  in  addition  to  the  start-stop  pairs  of 
calls  already  in  the  code,  extra  calls  to  RECORD  were  required  to 
organize  the  subcategorization  of  algorithm  time. 

The  new  RECORD  explained  above  has  the  ability  to  accumulate 
TALG  in  subtotals.  A  total  of  21  bins  are  provided  within  RECORD; 
20  accumulated  algorithm  count  subtotals  (ALGCNT(l) . . .ALGCNT(20) ) , 
and  the  last  accumulated  OTHCNT.  To  use  the  20  algorithm  bins,  a 
mechanism  was  needed  to  easily  change  the  current  algorithm  bin  and 
thus  redirect  the  algorithm  counts.  Three  new  types  of  calls  to 
RECORD  allowed  both  permanent  and  temporary  bin  changes.  The  bin 
number  for  algorithm  time  can  be  changed  only  when  the  algorithm 
timer  is  stopped.  The  new  bin  number  specifies  the  bin  which  will 
recieve  future  algorithm  PSCPU  counts.  The  bin  number  was  added  to 


the  formal  parameter  list  of  RECORD. 


For  permanent  changes  in  the  desired  bin  number,  the  new  bin 
number  can  be  specified  either  in  a  start  call,  or  by  using  the  new 
set-bin  call.  Each  of  these  calls  causes  RECORD  to  discard  the 
current  algorithm  bin  number  and  start  using  the  new  number.  The 
difference  in  the  calls  is  that  the  set-bin  call  does  not  turn  the 
algorithm  timer  on;  this  is  still  done  only  on  start  calls. 

Sometimes  only  a  temporary  change  in  the  bin  number  is 
desired.  This  frequently  happens  when  a  routine  calls  a  subroutine 
which  may  need  to  change  the  bin.  Ve  wanted  to  avoid  having  the 
instrumented  routines  remember  and  reset  the  desired  bin  number 
after  the  subroutine  is  finished.  Instead,  we  maintain  within 
RECORD  a  stack  of  bin  numbers  with  the  current  bin  on  top.  When 
the  subroutine  is  called  it  performs  a  push-bin  call  to  RECORD. 
This  call  first  pushes  the  current  bin  number  onto  the  stack 
one  level  down.  Then,  it  puts  the  subroutine's  new  bin  number  on 
top  of  the  stack  as  the  current  bin.  Before  the  subroutine 
exits,  it  performs  a  pop-bin  call  to  RECORD,  which  bin  number  set 
by  the  subroutine  from  the  top  of  the  stack  and  replaces  it  with 


the  previous  bin  number  from  below 


The  subcategories  of  GA  steps  for  which  algorithm  time  was 
recorded  were: 

1.  Function  evaluations,  i  =  l...m 

2.  Function  evaluations,  i  =  m  +  1 

3.  Selecting  the  order  of  constraint  examination 

4.  Processing  record  points 

5.  Gradient  evaluations,  i  =  l...m 

6.  Gradient  evaluations,  i  =  m  +  1 

7.  Gradient  normalizations 

8.  Calculating 

9.  Calculating  the  vector  d 

10.  Updating  x  and  0^ 


PART  3 


ELLIPSOID  ALGORITHM  CONVERGENCE  USING  DEEP  CUTS 

3..1.  Deep  cats  Without  a.  Linesearch 

In  this  section,  we  describe  five  simple  methods  for 
constructing  a  deep  cut  hyperplane  which  do  not  require  a 
linesearch  to  be  performed.  We  then  perform  computational 
experiments  to  determine  how  each  of  these  methods  affects  the 
accuracy  and  efficiency  of  the  EA.  In  developing  the  methods,  we 
assume  that  the  functions  f^...fm+^  are  convex.  The  computational 
results  demonstrate  the  extent  to  which  the  cuts  degrade  algorithm 
accuracy  and  robustness  on  nonconvex  problems. 

3.1.1  Super-cuts  Using  Center  Data 

The  first  of  these  cut  techniques,  the  super-cut,  was  outlined 

by  Shot  in  [17].  Given  the  current  objective  function  value 

f  (xk),  suppose  that  the  current  record  value  fr  satisfies 
B+l 

fV., (xk)  =  f  (xk)  -  fr  >  0.  That  is,  the  current  centerpoint  xk 
m+1  n+i 

lies  outside  the  record  value  level  set  G.  We  define  the  suoer-cut 
Point 


Xs  =  Xk  -  <fV,<xk)/||  Vf  ^(xk)ll  )g  Al(xk), 


T 


where  f  (x  )  =  f  . (x  )  -  f  ,  and 
m+1  m+1 


where  the  L-2  norm  is  nsed.  We  construct  Hv  by  selecting  the  cut 
point  and  cut  gradient  to  be 


c  k 
x  =  x 

and  g°  =  8ffl+1(xk). 


s  It 

The  super-cut  point .  x  ,  is  the  point  along  -g  (x  )  where  a 

m+1 

linear  approximation  to  at  x^  yields  a  function  value  equal  to 

that  of  the  level  set  G.  Using  this  definition  and  the  support 

+ 

inequality,  it  is  easily  shown  that  contains  G,  and  thus  a 

super-cut  is  solution-preserving.  This  cut  is  one  of  the  two  deep 
cuts  where  the  cut  gradient  is  not  the  gradient  at  the  cut  point. 
Since  the  gradient  at  the  centerpoint  is  used  for  constructing  H^, 
we  call  this  cut  the  super-cut.  using  center  data.  Note  that  the 
super-cut  point  may  or  may  not  be  in  E^. 


For  these  super-cuts  the  depth  of  cut  simplifies  to 


« -  £I+i<‘k>'«7W*k,)TQi  Vfm+i<‘k»1/2 


which  is  nonnegative.  Computationally,  for  this  super-cut  there  is 


no  need  to  actually  calculate  the  super-cut  point,  since  a  and  g 


3., ,1.2  Kellev-cuts  Using  Center  Data 

When  C  G',  the  super-cut  of  3.1.1  creates  a  deep  cut 

v  k 

which  preserves  the  set  G.  The  value  fm+^(x  )  >  0  represents  the 
difference  in  objective  function  values  between  the  centerpoint  and 
the  level  set  G  which  is  to  be  preserved.  We  can  implement  an 
identical  cut  for  the  feasibility  constraints.  If  the  feasibility 
constraint  f^(x)  <  0  is  violated  then  xk  is  outside  the  associated 
feasible  set  S..  Here,  f.(x  )  =  f . (x  )  -  0  =  f . (x  )  >0  represents 

ill  i 

the  difference  in  the  constraint  function  values  between  the 
centerpoint  and  the  feasibility  set  which  is  to  be  preserved. 

s  k 

The  Kel lev-cut  point  x  is  the  point  along  -g^x  )  where  a 
linear  approximation  to  f^  at  xk  yields  a  function  value  equal  to 
that  of  the  boundary  of  level  set  (i.e.,  0). 

Xs  =  Xk  -  (fv(xk)/ll  Vf  .(Xk)ll  )g.(xk), 
l  ii 


where  the  L-2  norm  is  used.  This  Kelley-cut  also  uses 


c  k 
x  =  x 


c  ,  k. 
g  =  gi(x  ). 


Again,  there  is  no  need  to  actually  calculate  the  super-cut  point. 


c  k 

since  a  and  g  are  calculated  from  data  at  the  centerpoint  x  .  We 


3..1..3.  Snper/Kellev  Cats  Using  Local  Data 

Center  cats,  and  super/Kelley-cuts  asing  center  data  are 
solation-preserving  if  the  function  used  to  create  is 
convex.  Preliminary  experiments  indicated  that  saper/Kelley  cats 
on  nonconvex  problems  occasionally  caased  the  EA  to  converge  to  a 
nonoptimnm  point,  more  often  than  did  center  cats. 

For  center  cats,  H^.  is  created  asing  the  sapport  inequality  to 
a  level  set  which  contains  the  level  set  to  be  preserved. 
Super/Kelley-cuts  asing  center  data  are  different  in  two  ways, 
when  we  consider  their  behavior  in  the  nonconvex  case.  First,  if 

5 

f  ^  is  nonconvex,  we  do  not  know  whether  x  is  inside  or  outside  the 
level  set  to  be  preserved,  since  we  did  not  evaluate  f^x  ). 
Second,  the  cat  hyperplane  is  created  asing  the  centerpoint 
gradient  instead  of  the  gradient  at  the  cat  point. 

For  these  reasons,  we  considered  a  variant  where  saper/Kelley 
cats  are  performed  only  if  x  is  outside  the  set  to  be  preserved, 
and  where  we  nse  the  gradient  at  the  cat  point  instead  of  at  the 
centerpoint.  Ve  call  this  variant  saper/Kelley  cats  using  local 


data.  The  algorithm  is: 


1.  Calculate  z  as  above. 

2.  Evaluate  fV(xS)  and  test  if  xs  is  in  the  level  set: 

1 

If  fV(xS)  <  0,  then  go  to  4. 

^  g 

3.  Otherwise,  evaluate  g.(x  )  and  test  the  depth  of  cut: 

c  s  * 

X  =  X 

c  ,  s. 
g  =  g.(x  ) . 

Calculate  a 

If  a  <  0,  then  go  to  4. 


4. 


5. 


Go  to  5 . 

Make  an  alternative  (center)  cut: 
c  k 


x 


c 

g 


g.(xk) . 


a  =0 

Update  xk  and  0^. 


In  step  3,  we  test  for  nonnegativity  of  a  since  even  in  the  convex 
case  it  is  possible  that  the  super /Kelley-cut  point  has  a 
positive  directional  derivative  of  f^  along  -g i<xk).  In  step  4, 
and  subsequently  during  this  analysis,  we  use  a  center  cut  when  the 
deep  cut  being  tested  cannot  support  the  level  set  with  0  <.  a  <  1, 
although  other  deep  cuts  could  have  been  considered  for  use  under 


these  circumstances. 
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1.1.4  Extended-cats 

The  final  deep  cut  which  does  not  require  a  linesearch  is  the 
extended-cat.  Suppose  that  the  algorithm  has  found  a  record  point 
xr  and  that  at  a  subsequent  iteration  x^  €  G'.  Since  we  want  to 
create  a  deep  cut  hyperplane  supporting  G,  and  we  know  that  xr 
is  on  the  boundary  of  G,  we  choose 


c  r 
x  =  x 


and  gC  =  gn+1(*r) 


For  the  extended-cut,  H^.  is  thus  the  support  hyperplane  of  G  at  the 

r  k  r 

boundary  point  x  .  Depending  on  the  orientation  of  x  ,  x  ,  and  G, 

it  is  possible  that  (xk  -  X* ) Tgffl+1 ( xr )  <  0,  and  thus  a  <  0.  For 

this  reason,  when  the  depth  of  an  extended-cut  is  found  to  be 

negative,  the  extended  cut  is  not  used  on  that  iteration.  If  an 

extended-cut  has  u  >  1  then  xr  4  E^,  which  implies  that  0^  is 

no  longer  numerically  positive  definite. 


A  computational  saving  is  possible  when  using  extended  cuts. 
If  xC  is  the  current  record  point  at  which  we  would  like  to  make  an 

C  f  j 

extended  cut,  we  need  to  know  g  =  gn+^(x  ^ '  However,  x  became  a 
record  point  at  some  previous  iteration  when  it  was  the  ellipsoid 
centerpoint.  On  that  iteration,  gn+j(xr)  was  calculated  for  the 


optimality  cut.  When  extended  cuts  are  being  used,  we  store  the 
objective  function  gradients  of  record  points,  assuming  that  the 


3.1.1 


and  Resnll 


The  deep  cuts  above  can  be  combined  with  each  other  and  with 
center  cuts  in  many  combinations,  based  on  which  region  x  is  known 
to  be  in.  The  EA  outlined  in  Part  1  tested  xk,  and  classified 
x*  as  being  in  S',  or  in  S  H  G1,  or  in  S  H  6,  prior  to  making  a 
cut.  Both  types  of  super-cuts  and  the  extended-cuts  above  may  be 
used  whenever  we  know  that  x^ €  G'.  The  two  types  of  Kelley-cuts 
can  be  used  whenever  we  know  that  x^ €  S'.  If  x^  €  (S  D  G) ,  we 
make  only  center  cuts. 


For  this  preliminary  analysis,  we  tested  the  five  deep  cuts 
one  at  a  time.  This  avoided  the  excessively  large  number  of 
experiments  which  would  be  required  to  examine  all  possible 
combinations  of  cuts.  The  cut  being  tested  was  attempted  every 
iteration  that  x  was  in  the  appropriate  region.  Whenever  the 
cut  being  tested  could  not  be  used,  we  used  a  center  cut 
instead.  Also,  we  used  the  same  cut  throughout  the  trajectory 
from  beginning  to  end,  and  did  not  attempt  to  determine  how  the 
cuts'  effects  may  vary  if  used  only  for  part  of  the  trajectory. 


Table  3.1  summarizes  how  often  the  center  cut  EA  trajectory 
falls  into  each  of  the  regions  S',  S  H  G1,  and  S  fl  G  for  the  13 
problems.  These  frequencies  suggest  how  often  certain  cuts  may  be 
used.  For  example,  super-  and  extended-cuts  may  not  be 
used  often  on  Colville  3.  For  most  of  the  problems,  Kelley-cuts 


can  be  performed  dramatically  more  often  than  super-  or 


and  efficiency  information  has  been  extracted  from  those  figures 


and  summarized  in  the  tables  below. 

Table  3.2  displays  the  accuracies  attained  by  the  various 
nonlinesearch  deep  cuts  on  the  13  test  problems.  The  entries  with 

asterisks  are  significant  in  that  these  experiments  converged  to  a 

* 

point  other  than  x  .  Thus,  the  statistics  in  Tables  3.3  and  3.4 
for  these  experiments  may  be  suspect. 


Table  3.2  Accuracies  of  Deep  Cuts  Without  Linesearches 


1 

Cuts 

made  in  S  ft  G' 

1 

Cuts  made 

in  S' 

1 

1 

1 

1 

Super 

1  Super 

1 

Kelley  1 

Kelley 

problem  1 

Center 

1  (center) 

1  (local) 

Extend 

1 

(center) 1 

(local) 

1 

Col 

1 

1 

-16.83 

1 

-16.50 

1  -16.63 

-16.28 

1 

-16.88  1 

-16.85 

1 

Col 

2 

1 

-14.36 

1 

-15.14 

1  -15.39 

-14.61 

1 

-14.11  1 

-13.99 

1 

Col 

3 

1 

-15.11 

1 

-14.96 

1  -15.22 

-14.97 

1 

-14.89  1 

-15.02 

1 

Col 

4 

1 

-16.58 

1 

-3.61* 

1  -16.58 

-16.58 

1 

(-16.58) 1 ( 

-16.58) 

1 

Col 

8 

1 

-14.99 

1 

-15.85 

1  -15.37 

-15.85 

1 

-14.97  1 

-14.99 

1 

Dem 

lb 

1 

-8.30 

1 

-8.93 

1  -9.61 

-8.30 

1 

-8.72  1 

-8.96 

1 

Dem 

2 

1 

-14.48 

1 

-14.40 

1  -14.39 

-14.42 

1 

-14.42  1 

-14.43 

1 

Dem 

3 

1 

-14.38 

1 

-14.41 

1  -14.40 

-14.35 

1 

-14.32  1 

-14.28 

1 

Dem 

4a 

1 

-15.55 

1 

-15.49 

1  -15.68 

-15.04 

1 

-15.33  1 

-15.71 

1 

Dem 

5 

1 

-15.08 

1 

-14.74 

1  -14.65 

-14.67 

1 

-14.40  1 

-14.42 

1 

Dem 

6 

1 

-17.57 

1 

-17.52 

1  -17.41 

-17.54 

1 

-4.05*1 

-3.53* 

1 

Dem 

7 

1 

-13.36 

1 

-13.42 

1  -13.35 

-13.42 

1 

-9.05*1 

-8.83* 

1 

Dem 

8a 

1 

-14.64 

1 

-15.89 

1  -14.61 

-15.34 

1 

-14.66  I 

-14.63 

Notes. 

e 

.  Entries 

are  lowest  log 

relative 

combined 

solution 

error 

level  attained  by  the  algorithm. 

..  *  represents  experiments  which  converged  to  points  other 
than  x  . 

. .  The  entries  for  Kelley-cuts  on  Colville  4  are  in 
parentheses  to  denote  that  no  Kelley-cuts  were  made  because 
all  centerpoints  were  feasible.  Only  center  cuts  were 
made,  therefore  the  trajectory  was  identical  to  that  of  the 
center  cut  variant  in  column  1. 


None  of  the  five  deep  cuts  offered  a  significant  increase  in 
accuracy  relative  to  center  cuts.  Extended  cuts  were  equivalent  to 
center  cuts  in  accuracy.  Using  local  data  for  super-cuts  did 
increase  accuracy  on  one  problem  (Colville  4)  where  using  center 


data  caused  the  algorithm  to  converge  to  a  non-optimum  point. 


However,  using  local  data  on  Kelley  cuts  was  not  sufficient  to 

* 

obtain  convergence  to  z  on  Dembo  6  and  Dembo  7. 

Table  3.3  displays  some  important  statistics  collected  while 
conducting  the  experiments  on  these  five  nonlinesearch  cut 
variants.  For  example,  on  Colville  8  super-cuts  were  performed  7% 
of  the  iterations  when  center  data  (gradients)  were  used.  When 
local  data  were  used,  super-cuts  were  attempted  on  31%  of  the 
iterations  and  passed  the  function  value  and  nonnegative  depth  of 
cut  ttfsts  on  24%  of  the  iterations.  The  depths  of  cut  that  are 
displayed  are  only  for  the  iterations  on  which  the  deep  cuts  were 
successful.  For  the  remaining  iterations,  a  =  0. 


Table  3.3  Frequency  and  Depths  of  Deep  Cuts  Without  Linesearches 


T— - 

Cuts  made  in 

S 

n  g 

1  Cuts  made  in 

S' 

1 

1 

Super  I 

Super 

1 

1  Kelley  I 

Kelley 

1 

1  problem 

(center) 1 

(local) 

1 

Extended 

1  (center) 1 

(local) 

1 

1  Col 

1 

19  .0511 

19/16  .046 

1 

22/ 

8  .075 

1  65  .0191 

62/47 

.021 

1 

1  Col 

2 

16  .0281 

16/14  .029 

1 

19/ 

5  .017 

1  75  .0051 

76/60 

.005 

1 

1  Col 

3 

3  .0691 

3/  2  .071 

1 

3/ 

3  .083 

1  83  .0191 

80/70 

.019 

1 

1  Col 

4 

*18  .0371 

71/61  .027 

1 

85/ 

6  .032 

1  1 

1 

1  Col 

8 

7  .0531 

31/24  .090 

1 

32/10  .123 

1  40  .0271 

48/25 

.084 

1 

1  Dem 

lb 

6  .0341 

6/  6  .035 

1 

6/ 

6  .034 

1  86  .0121 

86/78 

.012 

1 

1  Dem 

2 

4  .1261 

5/  2  .115 

1 

4/ 

4  .102 

1  80  .0191 

80/74 

.019 

1 

1  Dem 

3 

6  .0471 

7/  4  .049 

1 

6/ 

4  .047 

1  80  .0201 

82/70 

.021 

1 

1  Dem 

4a 

10  .0351 

12/  7  .031 

1 

10/ 

7  .031 

1  76  .02lj 

76/69 

.020 

1 

1  Dem 

5 

14  .0381 

15/11  .042 

1 

16/ 

5  .031 

1  73  .0161 

74/64 

.018 

1 

1  Dem 

6 

26  .0441 

3/  2  .038 

1 

3/ 

2  .053 

1*12  .0991 

* 

1 

1  Dem 

7 

5  .0241 

6/  2  .028 

1 

6/ 

3  .024 

1*84  .0101 

*83/77 

.010 

1 

1  Dem 

8a 

16  .0331 

16/10  .033 

1 

16/ 

6  .030 

1  71  .0201 

71/69 

.019 

1 

Avg  a  : 

.0511 

.054 

1 

.057 

1  .0181 
-4— — - - 4.. 

.024 

1 

-4- 

Notes : 

..  The  entries  for  super-  and  Kelley-cuts  using  center  data  are: 

percentages  of  iterations  on  which  the  deep  cut  was  made, 
and  average  depth  of  cut  for  the  deep  cuts  that  were  made. 

..  The  entries  for  all  other  cuts  are: 

percentages  of  iterations  on  which  the  deep  cut  was 
attempted/ successful, 

and  average  depth  of  cut  for  the  deep  cuts  that  succeeded. 

..  *  denotes  that  the  statistics  may  not  be  meaningful, 
because  the  algorithm  converged  to  a  nonoptimal  point. 

..  The  entries  for  Kelley-cuts  on  Colville  4  are  blank.  No 
Kelley-cuts  were  made  because  all  centerpoints  were 
feasible . 

..  No  entries  are  given  for  Kelley-cuts  using  local  data  on 
Dembo  6  because  the  depths  of  cut  almost  immediately 
exceeded  1. 

..  Average  depths  of  cut  are  calculated  excluding  Colville  4, 
Dembo  6,  and  Dembo  7,  so  as  to  include  only  trajectories 


which  converged  to  optimal  points. 


For  super/Kelley-cuts  using  local  data,  the  the  frequency  of 


success  is  usually  less 

than 

the 

frequency 

of 

attempts . 

For 

nonconvex  problems.  xS 

may 

have 

been  in 

the 

level  set 

to  be 

preserved.  Even  on  convex  problems,  numerical  roundoff  errors  may 
indicate  that  xS  is  not  outside  the  boundary.  Also,  on  all 
problems,  a  negative  depth  of  cut  may  prevent  the  local  gradient 
from  being  used.  For  extended-cuts,  the  ratio  of  successes  to 
attempts  is  a  function  of  the  orientation  of  x*  and  xr  with  respect 
to  G.  The  frequency  statistics  can  be  used  to  decide  whether  the 
deep  cuts  were  successful  often  enough  to  allow  an  adequate 
evaluation  of  their  effects.  Some  frequencies  seem  small, 
especially  those  of  extended-cuts,  le  discuss  this  further  after 
the  efficiency  results  are  presented  in  Table  3.4. 

The  depths  of  cut  resulting  from  super-  and  ext snded-cuts  are 
almost  three  times  as  great  as  for  Kelley-cuts.  Also,  the  depths 
of  cuts  for  super/Kelley  cuts  using  loc-.i  data  appears  to  be 
essentially  the  same  as  when  center  data  is  used.  It  is  initially 
difficult  to  evaluate  the  significance  of  the  depth  of  cut  values, 
and  we  discuss  this  after  the  efficiency  results  are  available. 

Table  3.4  shows  the  relative  efficiency  of  the  five 


nonlinesearch  deep  cuts.  The  center  cut  trajectory  is  taken  as  the 


standard.  The  second  entry  for  example,  shows  that  on  Colville  1 


super— cuts  using  center  gradients  reached  each  error  level  in  an 
average  of  only  83%  of  the  time  that  it  took  center  cuts  to  reach 
the  same  levels. 


Table  3.4  Efficiencies  of  Deep  Cuts  Without  Linesearches 


H 

— 

Cuts 

made  in 

S 

n  g' 

H - 

1  Cuts  made  in  S' 

-+ 

1 

1 

1 

Super 

Super 

1 

1  Eelley  I 

Eelley 

1 

1  problem! 

Center 

(center) 

(local) 

1 

Extend 

1  (center) 1 

(local) 

1 

1  Col  1 

I 

1.00 

0.83 

0.96 

1 

0.82 

i  0.92  | 

1.03 

1 

i  Col  2 

1 

1.00 

0.87 

0.90 

1 

0.97 

1  0.93  I 

1.02 

1 

1  Col  3 

1 

1.00 

1.09 

1.12 

1 

1.06 

i  0.82  I 

1.03 

1 

1  Col  4 

1 

1.00 

1.07* 

1.18 

1 

0.99 

1  (0.99)  I 

(0.99) 

1 

1  Col  8 

1 

1.00 

0.87 

0.93 

1 

0.90 

1  1.01  1 

0.99 

1 

I  Dem  lb 

1 

1.00 

0.93 

0.98 

1 

0.93 

1  0.78  I 

1.38 

1 

1  Dem  2 

1 

1.00 

0.91 

1.02 

1 

0.97 

1  1.06  I 

1.36 

1 

1  Dem  3 

1 

1.00 

0.99 

1.00 

1 

0.96 

1  0.88  I 

1.08 

1 

1  Dem  4a 

1 

1.00 

0.93 

1.03 

1 

0.87 

1  0.80  I 

1.13 

1 

1  Dem  5 

1 

1.00 

0.89 

0.96 

1 

0.94 

1  0.86  I 

1.15 

1 

I  Dem  6 

1 

1.00 

0.96 

0.99 

1 

0.90 

1  0.29*  I 

0.22* 

1 

1  Dem  7 

1 

1.00 

0.93 

0.99 

1 

0.90 

1  0.75*  I 

1.04* 

1 

1  Dem  8a 

1 

1.00 

0.92 

1.02 

1 

0.93 

1  0.83  1 

1.18 

1 

Avg: 

1 

+- 

1.00 

0.93 

i__ - 

1.01 

1 

■+- 

0.93 

1  0.89  I 

j - 1- 

1.13 

1 

■+ 

Notes: 

..  The  entries  for  Eelley— cuts  on  Colville  4  are  in 
parentheses  since  no  Eelley-cuts  were  made  because  all 
centerpoints  were  feasible.  Only  center  cuts  were  made, 
and  the  trajectory  was  identical  to  that  of  the  center  cut 
variant  in  column  1.  The  efficiency  values  are  included  to 
demonstrate  the  replicability  of  effort  measurements  with 
those  of  the  center  cut  variant. 

..  *  Super-cuts  using  center  data  on  Colville  4,  and  both 
Eelley  cuts  on  Dembo  6  and  7  were  excluded  from  the 
efficiency  averages  because  the  optimal  point  was  not 
found. 


6 


Table  3.4  shows  that  three  o£  the  deep  cut  variants  definitely 
improve  the  efficiency  of  the  EA.  It  is  interesting  that  extended- 
and  super-cuts  (center  data)  achieve  a  relative  efficiency  of  0.93 
while  Kelley— cuts  (center  data)  achieved  a  0.89  relative 
efficiency.  In  Table  3.3  we  saw  that  these  extended-  and  super-cuts 
were  usually  used  only  3%  to  16%  of  the  iterations,  while  these 
Kelley-cuts  were  usually  used  65%  to  80%  of  the  iterations. 
Evidently,  the  greater  depth  of  cut  of  the  extended-  and  super-cuts 
must  compensate  for  their  much  lower  frequency  of  use.  Also,  we 
can  now  conclude  that  these  frequencies  of  use,  while  low,  were 
high  enough  to  affect  EA  convergence. 

Although  depths  of  cut  such  as  0.05  do  not  seem  very  great, 
they  are  evidently  large  enough  to  noticeably  speed  algorithm 
convergence.  To  view  this  affect  in  another  way,  we  examine 
Colville  1  using  super-cuts  (center  data).  An  average  a  =  .051 
occurred  on  21%  of  the  iterations,  while  a  -  0  occurred  on  the 
other  79%  of  the  iterations.  The  last  record  point  was  found  at 
iteration  k  =  1413.  Using  the  formula  for  ellipsoid  volume 
reduction  with  n  =  5,  the  ratio  of  the  super-cuts  ellipsoid  volume 
to  the  center  cuts  ellipsoid  volume  after  each  had  performed  1413 

_g 

iterations  with  the  appropriate  depths  of  cut  was  3.8x10 


To  summarize,  of  these  five  deep  cuts  without  linesearches 


extended-cats  appear  to  be  the  only  cuts  which  offer  an  improvement 
in  efficiency  with  no  degradation  in  accuracy. 


Super/Kelley-cuts  using  center  data  offer  almost  uniformly 
better  efficiency  than  center  cuts.  However,  both  caused  a  loss  of 
accuracy  on  some  nonconvex  problems.  These  deep  cuts  could  be  used 
to  safely  improve  algorithm  efficiency  if  the  problem  being  solved 
is  known  to  be  linear  or  convex. 

Super/Kelley-cuts  using  local  data  (checking  that  xS  is  not  in 

g 

the  set  to  be  preserved  and  using  the  gradient  at  x  )  do  not  appear 
to  warrant  further  consideration.  They  do  not  improve  accuracy 
relative  to  center  cuts,  and  their  efficiency  usually  was  worse 


than  center  cuts. 


3.. 2  Deep  Cots 


Requiring  4  Linesearch 


When  the  centerpoint  z  is  outside  the  solution-containing  set 
$  0  6,  a  linesearch  may  be  used  to  find  a  deep  cut  point  where  the 


support 

inequality 

will  ensure 

that  (S  n  G)  C 

<• 

If  xk  e  S'  we 

perform 

a  linesearch 

in  a  descent 

direction  of 

f . 

l 

to  find  the 

boundary  of  S^,  where  i  is  the  index  of  the  violated  constraint. 
If  xk €  G'  we  perform  a  linesearch  in  a  descent  direction  of 
to  find  the  boundary  of  G. 

y 

We  use  the  f^  notation  often  here,  so  that  our  algorithms  will 

v 

apply  to  both  feasibility  cuts  and  optimality  cuts.  The  function  f^ 
is  defined  as 

fj(x)  for  i  =  1. .  .m 

Wx>  - fI  for  1  =  *  + 1 

and  represents  the  difference  between  the  values  of  f^  at  a  point 
x  and  on  the  boundary  of  the  level  set  to  be  preserved  by  the  cut. 

Searching  for  a  point  on  the  boundary  of  or  G  is  a  search 

v  k 

to  find  a  solution  of  f^(x)  =  0.  At  the  centerpoint  x  , 

v  k 

f^(x  )  >  0.  If  we  know  that  the  other  endpoint  of  the  line  segment 

y 

to  be  searched  has  f^(x)  <  0,  then  a  zero-finding  linesearch 

y 

subroutine  can  be  used  to  find  the  solution  point  of  f.(x)  =  0. 


fI<x>  = 


9 


v 

Otherwise,  we  £irst  search  the  line  segment  to  minimize  until  we 

y 

find  a  point  with  f^(x)  <  0  (inside  the  level  set).  Having  found 
an  interior  point,  we  perform  a  zero-finding  linesearch  to  find  a 

y 

point  where  f^x)  =  0  (on  the  boundary  of  the  level  set). 


Both  the  minimization  and  the  zero-finding  linesearch 
subroutines  search  a  line  segment 


xk  +  Xdg,  0  <.  Sq  <.  X  <.  bjj, 

where  X.a^.bQ  €  R*» 

and  xk,d  €  Rn. 

s 


ds  is  the  direction  of  search,  and  a^  and  b^  are  the  left  and  right 
endpoints  of  the  initial  interval  of  uncertainty,  in  which  the 
minimum  or  the  zero  of  the  function  lies.  Vhen  the  subroutines 


terminate  the  search,  the  values  a  and  b  for  the  current  interval 
of  uncertainty  are  returned.  In  addition,  some  of  the  routines 
return  a  third  value  A.  €  [a,b],  which  is  the  current  estimate  of 
the  minimizing  point  or  the  zero  of  the  function. 


The  minimization  subroutines  terminate  when  either  of  the 
following  criteria  is  satisfied: 


1.  The  level  set  has  been  penetrated 


T 


2.  The  subroutine  converged  to  a  reasonable 
approximation  of  the  minimum  point  (convergence  of 
the  interval  of  uncertainty,  convergence  of  the 
approximation  point,  zero  directional  derivative,  or 
maximum  number  of  linesearch  subroutine  iterations), 
without  penetrating  the  level  set. 

When  a  minimization  routine  terminates  because  the  level  set  was 
penetrated,  xk  +  Xdg  is  the  point  inside  the  level  set.  In  the 
minimization  subroutines,  the  left  endpoint  always  has  a  negative 
directional  derivative  of  f^  along  dg.  The  directional  derivative 
at  the  right  endpoint  is  known  or  presumed  to  be  positive.  The 
directional  derivative  at  X  may  be  positive  or  negative. 

The  zero-finding  subroutines  terminate  when  they  converge  to  a 
reasonable  approximation  of  the  boundary  of  the  level  set,  using 
convergence  criteria  similar  to  those  in  2.  above.  Here, 
fT(xk  +  ad  )  <0,  fT(xk  +  bd  )  >0,  and  fY(xk  +  Xd  )  may  be 

1  S  —  1  S  -  1  S' 

positive  or  negative.  j 

Suppose  that  the  zero-finding  subroutine  has  converged  to  the 
boundary  of  the  level  set.  We  must  select  which  of  the  two  or 

three  points  a,  b,  X  is  used  to  determine  the  deep  cut  point, 

c  c  c 

x  .  We  need  to  select  x  and  the  cut  gradient  g  so  that  the  cut 

is  solution-preserving,  and  also  so  that  a  2  0  for  the  best 

ellipsoid  volume  convergence. 


To  ensure  the  cut  is  solution-preserving,  we  select  a  point  on 


51 

c 

or  outside  the  boundary  as  the  cut  point  z  .  We  use  x  +  Xd  if 

v  k  k 

f.(x  +  Xd  )  >  0,  otherwise  we  use  z  +  ad  .  Then,  selecting 
1  s  s 

c  c 

g  =  g^(x  )  ensures  that  the  solution-containing  set  is  preserved 

by  the  support  inequality.  Since  both  x*  +  Xd  and  xk  +  ad  have  a 

s  s 

negative  directional  derivative  of  f^  along  ds>  a  _>  0. 

When  a  minimization  subroutine  terminates  without  having  found 

the  level  set  to  be  supported,  a  cut  point  must  still  be  selected. 

We  cut  at  x*  +  adg ,  the  left  endpoint  of  the  remaining  interval  of 

v  k 

uncertainty.  However,  if  f^  is  minimized  near  or  at  x  +  ad^,  then 

v  k 

the  directional  derivative  of  f.  at  x  +  ad  may  be  almost  0. 

1  s 

Thus,  the  depth  of  cut  may  be  almost  0  when  the  level  set  to  be 
supported  is  not  found.  The  tables  below  provide  the  frequencies 
with  which  the  level  set  was  found  versus  when  it  was  not  found,  to 
distinguish  the  deep  cuts  from  the  almost-center  cuts. 

Since  we  wanted  to  use  each  deep  cut  as  often  as  possible,  we 
perform  the  linesearch  cuts  for  both  feasibility  and  optimality 


deep  cuts 


3..2..1,  Search  Along  d 

There  are  several  ways  of  selecting  the  vector  along  which  the 
linesearch  is  performed.  Since  xk  is  ontside  the  level  set,  the 
linesearch  should  be  in  a  descent  direction  of  f^.  The  vector 


jC  _  a  c  # / /  c %Ta  c \ 1 / 2 

d  =  ~Q.g  /((g  )  Q^g  ) 


used  in  the  update  formulae  gives  rise  to  the  point 
c  c 

y  =  x  +  d  on  the  ellipsoid  boundary,  which  is  the  minimizing 

c  T 

point  of  the  function  (g  )  x  over  E^.  Similarly,  the  vector 


dk  =  -Qkg.(xk)/((g.(xk))TQkgi(xk))1/2 


k  k  k 

can  be  used  to  find  the  point  y  =  x  +  d  on  the  ellipsoid 

k  T  k 

boundary  which  minimizes  (g^x  ))  x  over  Ej_.  The  vector  d  is  a 
descent  direction  of  f^  since  is  positive  definite,  and  not  only 
specifies  a  search  direction,  but  also  specifies  how  far  to  search 
in  that  direction.  Ve  use  yk  as  the  endpoint  of  the  search,  which 
ensures  that  x  £  E^.  In  [12],  deep  cuts  are  performed  used  this 
search  vector.  The  algorithm  steps  are: 


1.  Initialize  for  the  linesearch  for  a  minimum: 
Let  d  ■  d. 


Y 


2.  Search  z  +  Xd  ,  X  e  [a_,b_],  for  a  minimum  of  f.  or 

s  u  0  1 

until  the  level  set  is  penetrated.  On  convergence: 

2.1  If  fV(xk  +  Xd  )  <0  was  found,  go  to  3. 

X  s 

2.2  Otherwise  (the  level  set  was  not  found),  cut  at  the 

current  left  endpoint: 

c  k  , 
z  =  z  +  ad 

s 

c  ,  c. 
g  =  gi(*  > 
stop 

3.  Initialize  for  the  linesearch  for  a  zero: 

Let  e  =  e  (bA  -  aA)/(b  -  X) 
z  z  0  0 


It  V 

4.  Search  z  +Xd  ,  X  €  [a_,b_],  for  a  zero  of  f.. 

sou  1 

On  convergence : 

4.1  If  fY(xk  +  Xd  )  <0, 

i  ,  s 

xC  -  x  +  ad 

s 

g  =  gjU  ) 
stop 

4.2  Otherwise, 

c  k  .  .  . 

z  =  z  +  Xd 

s 

c  ,  c. 
g  “  g£(*  ) 
stop 

tes : 

In  3,  the  interval  convergence  tolerance  for  the  zero¬ 
finding  linesearch,  e  ,  is  adjusted  to  remain  a  fraction 

z 

of  the  original  interval  of  uncertainty. 


3_.2.2  Search  Along  -g. 

Another  descent  direction  of  f.  that  can  be  used  as  a  search 

1 

direction  is  that  of  the  negative  gradient,  -g^x*).  When 
searching  along  d,  the  vector  d  gave  both  the  direction  and  the 
endpoint  of  the  search.  When  -g  is  used  instead,  we  must  determine 
how  far  along  -g  to  search.  It  is  difficult  to  determine  the 
intersection  of  the  boundary  of  and  the  ray  xk  -  Xg,  X  >  0. 

Further,  x  €  is  not  a  necessary  requirement  for 

solution-preserving  cuts.  Therefore,  our  method  selects  a  search 
endpoint  x^  +  bdg  without  testing  whether  it  is  in  E^. 

The  endpoint  of  search  is  selected  by  examining  points 

x*  +  bd^  with  increasing  values  of  b,  until  an  endpoint  is  found 

v  k 

which  satisfies  one  of  two  stopping  criteria.  If  f.(x  +  bds>  <  0, 

v 

then  the  level  set  has  been  penetrated.  If  f^  is  increasing  at 

+  bd  ,  then  the  depth  of  cut  is  negative  there,  so  any  cut  point 
s 

of  interest  is  to  the  left  of  x^  +  bdg .  Having  found  the  endpoint 
which  satisfies  a  stopping  criterion,  the  algorithm  uses  the 
minimization/ zero-finding  subroutines  in  a  manner  similar  to  when 
searching  along  d. 

We  wish  to  select  appropriate  points  x*  +  bds  to  examine.  We 

3  k  3 

use  the  vector  dg  s  x  -  x  ,  which  lies  along  -g  (x  represents  the 


Kelley-cut  point  if  i  *  l...m,  or  the  super-cut  point  if 
i  *  m  +  1).  If  f.  is  linear,  then  the  boundary  of  the  level  set  to 


be  preserved  is  at  x  +  bd  with  b  =  1.  If  f.  is  convex,  then  the 

s  1 

v  k 

boundary  cannot  occur  with  b  <  1.  We  first  test  f^(x  +  bd^)  with 
b  =  1,  and  then  with  values  of  b  that  increase  by  a  constant 
multiplicative  factor.  When  the  level  set  is  encountered  or  f. 
starts  to  increase,  +  bd$  is  the  right  endpoint  of  the  interval 
of  uncertainty. 

During  this  process  of  testing  out  along  increasing  values  of 

b,  two  other  points  are  maintained  to  determine  the  left  endpoint 

of  the  interval  of  uncertainty.  Let  X.  be  the  value  of  b  on  the 

preceding  step,  and  a  be  the  value  of  X  on  the  preceding  step. 

Initially,  a  =  X  =0.  When  the  level  set  is  penetrated,  the 

v  k 

boundary  must  be  between  X  and  b  (since  f ^(x  +  Xds>  >  0, 

otherwise  the  process  would  have  stopped  on  the  preceding  step). 
When  f^  starts  to  increase,  the  minimum  point  is  between  a  and  b. 

If  the  level  set  is  penetrated,  then  a  zero-finding  subroutine 

is  used  to  find  the  boundary  of  the  level  set,  as  when  searching 

along  d.  If  the  process  of  increasing  b  stopped  because  f^  was 

increasing,  then  a  minimization  subroutine  is  used  to  try  to  find  a 
v 

point  where  f^  is  negative,  also  as  when  searching  along  d. 

Thus,  the  linesearch  along  -g  differs  from  that  along  d  not 
only  in  the  search  direction  but  also  in  that  the  (right)  endpoint 
of  search  is  determined  by  a  process  which  tests  out  along  -g.  The 


search  along  -g  may  nse  different  left  endpoints  of  search  as  well, 
because  of  the  information  generated  during  the  testing-out 
process.  Once  both  endpoints  are  known,  the  use  of  the 
minimization  and  zero-finding  subroutines  is  then  similar  to  that 
of  searching  along  d.  Finally,  since  it  may  be  that  z  4  E^,  we 
need  to  test  whether  a  <.  1  before  using  the  ellipsoid  update 
formulae . 

The  specific  steps  of  the  algorithm  are: 

1.  Initialize  to  search  out  along  -g: 


57 


2.  Search,  out  along  — g  until  fY  increases  or  becomes  negative: 

2.1  Start  a  new  iteration: 

j  =  j  +  1 

2.2  Test  the  new  right  endpoint: 

2.2.1  If  fj(xk  +  bds)  <=  fT(xk  +  Xd^ ) ,  go  to  2.2.2 

Otherwise,  the  function  is  increasing  at  b: 

e  =  a  / (b  -  a) 
m  m 

s  =  e  / (b  -  a) 
it  z 


go  to  3 

2.2.2  If  fj(xk  +  bds)  >  0,  go  to  2.2.3 

Otherwise,  the  level  set  has  been  penetrated: 

e  =  a  / (b  -  X) 
z  z 


go  to  4 


2.2.3  The  function  may  or  may  not  be  increasing  at  b, 

see  whether  another  iteration  is  allowed: 

If  j  <  10,  go  to  2.2.4 

Otherwise,  cut  at  the  midpoint: 
xc  =  xk  +  Xd 

s 

C  ,  Cl 

g  =  gjU  > 

stop 

2.2.4  Prepare  for  a  new  right  endpoint: 

a  =  X 


go  to  2.1 


3.  Search  x  +  Xd  ,  X  €  [aA,bA],  for  a  minimum  of  f.  or 

sou  1 

until  the  level  set  is  penetrated.  On  convergence: 
v  k 

3.1  If  f.(x  +  Xd  )  <  0  was  found,  initialize  for  the 

1  s 

linesearch  for  a  zero: 

e  =  e  (bn  -  an)/(b  -  X) 
z  z  0  0 

aQ  =  a 

b0  =  X 
go  to  4 

3.2  Otherwise  (the  level  set  was  not  found),  cut  at 

the  current  left  endpoint: 

c  k  . 
x  =  x  +  ad 

s 

8  =  g^x  ) 

stop 

^  V 

4.  Search  x  +  Id^,  X  €  [a^.b^],  for  a  zero  of  f^. 

On  convergence : 

4.1  If  fT(xk  +  Xds)  <  0. 

xc  *  xk  +  ad 

s 

c  ,  c. 

8  **  8i(x  > 

stop 

4.2  Otherwise, 

c  k  ,  .  . 

x  “x  +  Xd 

s 

c  ,  c. 

8  *  8i<x  ) 

stop 

'tes: 

In  2.2.4,  preliminary  experiments  showed  that  an  expansion 
factor  of  two  was  excessively  large  (penetration  or 
reversal  usually  occurred  after  only  one  expansion).  The 
expansion  factor  shown  was  selected  so  that  three 

expansions  would  be  required  before  the  interval  had 

doubled. 


In  2.2.1,  2.2.2,  and  3.1,  e  is  the  interval  convergence 

ID 

tolerance  for  the  minimization  linesearch,  while  e  is  the 

z 

interval  convergence  tolerance  for  the  zero-finding 
linesearch  snbrontine.  These  tolerances  are  adjusted  to 
remain  a  fraction  of  the  original  interval  of  uncertainty. 


If  we  have  a  record  point  z  ,  then  x  €  (S  H  G) .  When 

(S  H  6),  a  search  of  the  line  segment  xr  -  x*"  will  find  the 

boundary  of  or  G  as  desired.  This  is  also  a  descent  direction, 

under  convexity  of  f ..  As  when  searching  along  d,  the  vector 

dg  =  xr  -  x*  provides  both  the  direction  and  the  endpoint  of 

search.  Another  advantage  of  this  search  direction  is  that  the 

r  lc 

boundary  of  the  level  set  is  sure  to  be  crossed  along  x  -  x  , 
whereas  it  need  not  be  crossed  when  searching  along  d  or  -g. 

However,  this  is  the  only  linesearch  cut  where  the  algorithm 
differs  for  feasibility  versus  optimality  cuts.  There  are  two 

extra  steps  required  for  the  optimality  cuts,  both  of  which  are 

caused  by  the  fact  that  xr  is  known  to  be  on  the  boundary  of  G. 

First,  the  extended-cut  of  3.1.4  is  the  same  as  the  cut  that 

would  result  if  a  linesearch  of  xC  -  z^  finds  a  boundary  minimum  of 
v  r 

f .  at  x  .  To  increase  algorithm  efficiency,  we  first  perform  the 
test  for  this  boundary  minimum,  and  perform  the  extended-cut  (if 
possible)  to  avoid  the  linesearch. 

Second,  when  the  extended-cut  is  not  possible,  the  algorithm 
can  not  proceed  directly  to  the  zero-finding  linesearch  because 
there  are  two  zeros  of  f^  in  the  interval  (the  right  endpoint  x  is 
a  zero  of  f,,  but  has  positive  directional  derivative  of  f,  along 


d  ).  Therefore,  a  minimization  search  mast  be  done  to  find  a  point 
inside  of  G,  which  will  be  ased  as  the  right  endpoint  for  the 
zero-finding  subroutine. 

The  steps  of  the  algorithm  are: 


1.  Test  whether  this  is  a  feasibility  or  an  objective  cut: 


1.1  If  i  <  m  +  1,  initialize  for  the  linesearch  for  a  zero: 


Let  d  =  xr  -  xk 
s 


aQ  =  ° 


b0  =  1 


go  to  4. 

1.2  If  <xr  -  xk)Tgm+1(xr)  >  0,  go  to  2. 

1.3  Otherwise, 

make  an  extended-cut 
stop 


2.  Initialize  for  the  linesearch  for  a  minimum: 
Let  d 


r  k 
x  -  x 


a0  =  ° 


T  V 

3.  Search  x  +  Xd  ,  X  €  [an,bn],  for  a  minimum  of  f.  or 
s  u  u  1 


until  the  level  set  is  penetrated.  On  convergence: 


3.1  If  fY(xk  +  Xd  )  <  0  was  found,  initialize  for  the 

X  9 


linesearch  for  a  zero: 
8 


8z(b0  "  a0)/(b  ‘  a) 


a 

X 


v0 
go  to  4. 

3.2  Otherwise  (the  level  set  was  not  found),  cut  at 
the  current  left  endpoint: 


The  three  methods  of  constructing  the  linesearch  interval 
above  form  the  basis  for  the  EA  variants  we  consider  in  this 
section.  However,  before  testing  the  three  linesearch  directions 
themselves,  #e  must  determine  how  the  minimization  and 
zero-finding  portion  of  each  search  is  to  be  performed. 
There  are  two  decisions  to  be  made.  First,  which  of  the  possible 
subroutines  for  minimization  and  zero-finding  should  be  used. 
Second,  for  each,  what  convergence  tolerance  should  be  used  (i.e., 
how  close  to  the  boundary  to  attempt  to  get). 

Our  approach  was  to  select  one  search  direction  to  use  while 
testing  the  various  subroutine /tolerance  combinations.  The  vector 
d  was  chosen  for  this  testing.  Then  we  would  investigate  each  of 
the  possible  subroutines  at  several  tolerances.  The 
subroutine /tolerance  combination  which  had  the  highest  efficiency 
would  subsequently  be  used  in  the  primary  experiments  comparing  the 
three  search  directions. 

There  are  two  linesearch  minimization  subroutines  which  we 
investigated.  The  first  is  Kupf erschmid' s  implementation  of 
Muller's  method  [12]  which  uses  both  gradient  and  function 
evaluations  during  the  minimization  process.  This  subroutine  was 
used  when  EA  deep  cuts  along  d  were  examined  in  [12],  Those 
preliminary  results  indicated  that  deep  cuts  using  Muller's  method 


did  not  result  in  increased  efficiency.  Therefore,  part  of  this 
analysis  was  to  see  how  simpler  linesearch  subroutines  would 
compare  with  more  complicated  ones.  For  this  reason,  the  second 
minimization  subroutine  investigated  was  a  modified  golden  section 
method  which  uses  function  evaluations  only.  When  used  by  our 
three  deep  cut  linesearch  algorithms,  the  function  values  at  the 
left  and  right  endpoints  of  the  interval  of  uncertainty  are  often 
known.  The  primary  modification  was  to  use  this  knowledge  to  aid  in 
positioning  the  interior  points  (only  one  interior  point  is  used 
while  the  endpoints  and  the  interior  point  have  monotonically 
decreasing  function  values). 

There  are  three  linesearch  zero-finding  subroutines  which  we 
investigated,  all  of  which  use  only  function  evaluations.  The 
first  is  Muller's  method  [12].  The  second  is  the  bisection 
method.  The  third  method  is  an  adaptive  hybrid  of  the  regula  falsi 
and  bisection  methods.  This  hybrid  performs  like  regula  falsi  if 
the  interval  of  uncertainty  is  decreasing  well,  but  adapts  to 
perform  more  like  bisection  when  the  interval  of  uncertainty  is  not 
decreasing  well.  The  hybrid  method  is  explained  in  Appendix  A. 

We  also  needed  to  decide  which  convergence  tolerances  to  use 
for  the  interval  of  uncertainty  in  the  minimization  and  in  the 
zero-finding  subroutines.  For  this  analysis,  we  selected 
tolerances  of  0.1,  0.01,  and  0.001  of  the  length  of  d  .  We  did  not 


require  the  tolerances  for  the  minimization  subroutine  to  be 
identical  to  that  of  the  zero-finding  subroutine.  These  tolerances 
also  affect  the  other  convergence  criteria  mentioned  in  3.2.  Since 
Muller's  method  sometimes  rapidly  finds  the  neighborhood  of  the 
minimizing/ zero  point,  but  is  then  slow  to  reduce  the  interval  of 
uncertainty,  we  exit  the  subroutine  when  the  approximations  to  the 
minimum/ zero  are  unchanging  within  the  same  interval  tolerance. 
Finally,  we  set  the  maximum  number  of  iterations  for  the  subroutine 
to  be  the  number  of  iterations  that  a  bisection  method  would 
require  for  the  same  interval  convergence  tolerance.  Subroutine 
termination  due  to  reaching  the  maximum  iteration  limit  seemed  to 
occur  only  at  the  looser  convergence  tolerances. 

Preliminary  results  demonstrated  that  the  13  test  problems 
varied  greatly  in  how  often  each  of  the  two  (minimization  and 
zero-finding)  subroutines  was  utilized.  For  the  purpose  of 
deciding  which  the  best  subroutines  and  tolerances  are,  we  selected 
test  problems  which  would  repeatedly  exercise  both  the  minimization 
and  the  zero-finding  subroutines.  This  allowed  a  more  rigorous 
test  of  the  subroutines  (with  a  decrease  in  the  number  of 
experiments  required). 

To  determine  how  frequently  each  test  problem  uses  the 
minimization  and  the  zero-finding  subroutines,  we  ran  the  search 
along  d  method  using  Muller's  subroutines  for  minimization  and 


zero-finding  with  0.01  convergence  tolerances  for  each 
subroutine.  Table  3.5  shows  the  percentage  of  iterations,  from  the 
first  iteration  through  the  final  record  iteration,  on  which  each 
subroutine  was  used  separately  or  together.  The  entries  of  0  for 
minimization  subroutine  usage  mean  that  every  time  y^  was 
calculated,  it  was  found  to  be  inside  the  level  set,  so  the 
algorithm  proceeded  directly  to  the  zero-finding  routine. 


Table  3.5  Frequency  of  Linesearch  Subroutine  Usages  by  Problem 


investigating  which  linesearch  subroutine /tolerance  is  most 
efficient.  The  Table  3.5  information  demonstrates  that  the 
zero-finding  routine  was  used  often  on  all  of  the  solved  problems. 
Thus,  we  chose  problems  which  would  most  often  use  the  minimization 
subroutine.  Of  the  Colville  problems,  numbers  1  and  4  were 
therefore  selected.  The  first  Dembo  problem  selected  was  Dembo 
8a.  Dembo  lb  is  next  highest  in  minimization  subroutine  usage,  but 
it  was  not  selected  so  as  to  avoid  including  only  convex  geometric 
NLPs.  Dembo  4a  was  therefore  selected  as  the  fourth  tesi  problem 
for  this  set  of  experiments.  The  problems  selected  include  two 
general  and  two  geometric  NLPs,  two  of  which  are  convex  and  two  of 
which  are  nonconvex. 

Having  determined  the  test  problems  to  use,  we  tested  the 
subroutine /tolerance  combinations.  There  are  6  combinations  of 
minimization  subroutine  and  tolerances,  and  9  combinations  of 
zero-finding  subroutine /tolerances .  We  chose  Muller's  method  at 
0.01  tolerance  as  the  default  combination  for  both  minimization  and 
zero-finding.  Thus,  6  experiments  were  run  testing  the  6 
minimization  subroutine /convergence  combinations,  all  of  which  used 
Muller's  method  and  0.01  for  the  zero-finding 
subrout ine /tole ranc e .  Similarly,  the  various  zero-finding 
subroutine/tolerance  combinations  were  tested,  all  using  Muller's 
method  and  0.01  tolerance  when  minimizations  were  performed. 


In  Appendix  D2,  Figures  D2.1  through  D2.12  display  the 


error-versus-ef fort  plots  for  the  minimization  subroutines,  while 
Figures  D2.13  through  D2.24  display  plots  for  the  zero-finding 
subroutines.  Table  3.6  displays  the  accuracy  attained  by  each 
subroutine  and  tolerance  combination.  The  nonconvexity  of  Colville 
4  caused  convergence  to  a  nonoptimal  point  in  8  of  the  16 
combinations.  Otherwise,  no  combination  appears  to  have  a  uniform 
advantage  if  accuracy  is  used  as  the  criterion. 

Table  3.6  Accuracies  for  Linesearch  Subroutines/Tolerances 


Table  3.7  displays  the  efficiency  of  each  subroutine  and 
tolerance  combination,  relative  to  center  cuts.  We  chose  between 


competitive  subroutines  using  the  criteria  of  EA  efficiency.  This 


may  or  may  not  equate  directly  to  linesearch  efficiency,  depending 
on  whether  each  subroutine  achieves  roughly  comparable  depths  of 
cut  at  each  convergence  level. 


Table  3.7  Efficiencies  for  Linesearch  Subroutines /Tolerances 


H - 

1 

1 

Problem 

-+ 

1 

1 

1  Subroutine /Tolerance 

Col  1 

1 

Col  4 

1 

Dem  4a 

Dem  8a 

1 

1  Muller  min  0.1 

1.82 

1 

2.26 

1 

2.40 

2.84 

1 

1  Muller  min  0.01 

1.84 

1 

2.29 

1 

2.44 

2.85 

1 

1  Muller  min  0.001 

1.87 

1 

2.03 

1 

2.37 

2.91 

1 

1  Golden  min  0.1  + 

1.56 

1 

3.63 

1 

1.62 

1.77 

1 

1  Golden  min  0.01 

1.78 

1 

4.32 

1 

1.63 

1.80 

1 

I  Golden  min  0.001 

1.57 

1 

5.00 

1 

1.63 

1.83 

1 

1  Muller  zero  0.1 

1.63 

1 

2.77 

1 

2.34 

2.72 

1 

1  Muller  zero  0.01 

1.84 

1 

2.29 

1 

2.44 

2.85 

1 

1  Muller  zero  0.001 

1.82 

1 

4.36 

1 

2.48 

2.89 

1 

1  Bisection  zero  0.1 

1.60 

1 

2.06 

1 

2.39 

2.70 

1 

1  Bisection  zero  0.01 

1.96 

1 

2.17 

1 

2.53 

2.88 

1 

1  Bisection  zero  0.001 

1.99 

1 

4.96 

1 

2.73 

3.15 

1 

1  Regula  zero  0.1  + 

1.53 

1 

2.04 

1 

2.37 

2.79 

1 

1  Regula  zero  0.01 

1.47 

1 

2.46 

1 

2.37 

2.75 

1 

1  Regula  zero  0.001 

l„  — ■■  - -  1 

1.66 

1 

3.46 

1 

2.47 

— 

2.95 

L_ - 

1 

Notes : 

..  +  represents  the  optimal  subroutine/tolerance  combination 
selected  in  each  category. 


The  efficiency  data  of  Table  3.7  was  then  analyzed  to 
determine  which  of  the  subroutines/tolerances  should  be  selected  in 
the  minimization  category,  and  in  the  zero-finding  category. 
Table  3.8  contains  the  efficiency  statistics  used  to  select  the 
optimal  linesearch  subroutine/tolerance  combinations.  The  first 


four  columns  are  the  rank  orders  of  the  efficiency  values  for  each 
problem.  The  fifth  column  has  asterisks  on  the 
subroutine/tolerance  combinations  with  nondominated  rank  vectors. 
From  these  nondominated  combinations,  we  selected  those  with  the 
lowest  values  of  sums  of  ranks,  and  sums  of  efficiencies  (the  sixth 
and  seventh  columns).  Asterisks  in  these  last  two  columns  mark 
those  values  which  are  near— minimum. 

Among  the  minimization  subroutines/tolerances,  the  golden 
section  with  0.1  tolerance  is  clearly  superior  to  the  other 
alternatives,  and  was  therefore  chosen  for  the  linesearch  direction 
experiments  to  follow.  Among  the  zero-finding  linesearch 
subroutines/tolerances,  the  hybrid  regula  falsi  with  0.1  tolerance 
was  chosen,  although  the  bisection  with  0.1  tolerance  was  almost  as 


efficient 


2 


3^2.5.  Experiments  and  Results 

Having  determined  the  best  minimization  subroutine/tolerance 
and  the  best  zero-£inding  subroutine /tolerance ,  the  three  search 
direction  algorithms  were  tested  on  all  13  test  problems.  Figures 
D3.1  through  D3.13  of  Appendix  D3  display  the  error-versus-ef fort 
plots  for  the  deep  cuts  using  linesearches . 

Table  3.9  displays  the  accuracies  attained  by  the  linesearch 
cuts  on  the  test  problems.  As  previously  seen  with  the 
nonlinesearch  deep  cuts,  the  algorithms  sometimes  converge  to 
nonoptimal  points,  on  nonconvex  problems. 


Table  3.9  Accuracies  for  Deep  Cuts  Using  Line  searches 


I 

1 

1 

Search 

S  Search 

1 

Search,  I 

1 

problem  I  Center 

1 

d 

1  -g 

1 

r  x  i 

x  -  x  | 

I  Col  1 
I  Col  2 
I  Col  3 
I  Col  4 
I  Col  8 
I  Dem  lb 
I  Dem  2 
I  Dem  3 
I  Dem  4a 
I  Dem  5 
I  Dem  6 
I  Dem  7 
I  Dem  8a 


-16.83 

-14.36 

-15.11 

-16.58 

-14.99 

-8.30 

-14.48 

-14.38 

-15.55 

-15.08 

-17.57 

-13.36 

-14.64 


-16.66 

-2.99* 

-15.16 

-15.72 

-14.86 

-9.06 

-14.43 

-14.31 

-15.26 

-14.49 

-17.53 

-10.62 

-14.86 


-16.38 

-14.81 

-15.01 

-16.58 

-1.44* 

-8.82 

-14.45 

-14.20 

-16.27 

-4.32* 

-2.74* 

-6.99* 

-14.61 


-15.55 

-15.09 

-14.92 

-16.58 

-14.71 

-9.04 

-14.42 

-14.15 

-15.16 

-3.80* 

-17.56 

-13.25 

-14.53 


. .  *  denotes  those  problems  on  which  the  algorithm  did  not 
converge  to  the  optimum  point. 


Table  3.10  shows  how  often  each  of  the  three  linesearch  cuts 
were  made,  how  often  the  level  set  was  actually  found  and 
supported,  and  the  average  depth  of  cut  for  those  iterations  on 
which  the  level  set  was  supported.  For  example,  on  Dembo  5, 
searches  along  d  were  attempted  92%  of  the  iterations.  41%  of  the 
iterations  resulted  in  a  deep  cut  where  the  level  set  was  found  and 
supported.  The  remaining  51%  resulted  in  a  near-center  cut  because 
the  level  set  was  not  found.  The  column  for  searching  xr  -  x*  has 


in  parentheses  the  percent  of  iterations  on  which  an  extended  cut 


was  performed  to  avoid  the  linesearch 


Table  3.10  Frequency  and  Depths  for  Deep  Cuts  Using  Linesearches 


I  problem 


I  Col  1 
I  Col  2 
I  Col  3 
I  Col  4 
I  Col  8 
I  Dem  lb 
I  Dem  2 
I  Dem  3 
I  Dem  4a 
I  Dem  5 
I  Dem  6 
I  Dem  7 
I  Dem  8a 


Search 

d 


82/41 
93/71 
81/68 
90/  5 
55/46 
96/  1 
85/26 
90/21 
94/  5 
92/41 
94/31 
93/39 
91/  0 


.024 

.014* 

.021 

.124 

.018 

.030 

.033 

.027 

.024 

.016 

.009 

.008 

.110 


Search 


Searcji 
i  -  x 


82/82 

95/95 

80/80 

79/51 

62/62 

80/80 

85/85 

89/89 

34/34 

86/86 

89/89 


71/68 

93/92 

79/75 

72/57 

57/52 

88/75 

80/67 

83/72 

82/66 

35/32 

88/76 

85/76 

86/70 


(  2) 

(  3) 

(  0) 

(  1) 

(  2) 

(  7) 

(  1) 

(  3) 

(  8) 

(  D* 

(  2) 

(  3) 

(  4) 


Avg  o:  +| 


.049  I 


.023  I 


otes: 

.  The  entries  for  searches  along  d  and  along  -g  are: 
percentages  of  iterations  on  which  the  deep  cut  was  tried,/ 
percentages  of  iterations  on  which  the  deep  cut  found  and 
supported  the  level  set,  and 
average  depth  of  cut  for  the  cuts  that  did  support  the 
level  set. 

.  The  first  three  entries  for  searching  along  xr  -  x*  are  the 
same  as  those  above.  The  entries  in  parentheses  are  the 
percentage  of  iterations  on  which  an  extended  cut  was  made, 
and  thus  no  linesearch  was  performed. 

.  *  denotes  those  problems  on  which  the  algorithm  did  not 
converge  to  the  optimum  point. 

.  The  average  depth  of  cut  figures  do  not  include  Colville  2, 
Colville  8,  Dembo  5,  Dembo  6,  or  Dgmbo  7,  because  of  the 
searches  which  did  not  converge  to  x  . 


.  +  notes  that  the  average  depth  of  cut  for  searching  along  d 
is  biased,  because  several  problems  which  had  the  lowest 
frequency  of  deep  cuts  (e.g.,  0%  to  two  digits)  had  the 
greatest  a. 
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Note  that  the  level  set  is  found  and  supported  most  often  by 
the  searches  along  -g  and  xr  -  and  least  often  by  searches  on 
d.  The  frequencies  and  depths  of  cut  for  searching  xr  -  x^  and 
searching  -g  are  very  similar.  The  depths  of  cut  for  searching 
along  d  appear  somewhat  deeper  than  along  -g  or  along  xr  -  xk,  but 
the  0.051  average  a  for  searches  along  d  is  biased  upward  by 
several  problems  with  unusually  deep  cuts  on  a  very  small  number  of 
iterations . 

The  depths  of  cut  attained  here  can  be  compared  with  those  of 
Table  3.3  for  nonlinesearch  deep  cuts.  For  example,  both  the 
super-  and  Kelley-cuts  position  the  cut  point  along  -g,  as  does  the 
search  along  -g  technique.  The  algorithms  here  performed 
linesearches  for  both  feasibility  and  optimality  deep  cuts.  We 
could  weight  the  super-  and  the  Kelley-cut  depths  of  cut  from 
Table  3.3  (0.051  and  0.018)  to  approximate  the  depth  of  cut  if 
super/Kelley  cuts  were  both  used.  The  result  appears  to  be  the 
same  depth  of  cut  that  is  achieved  here  (0.023)  only  after 
considerable  linesearch  effort. 

Table  3.11  shows  the  relative  efficiencies  for  each  of  the 
three  search  direction  algorithms,  on  the  13  test  problems.  After 


Table  3.7  above  was  presented,  we  chose  to  combine  the  golden 
section  minimization  at  0.1,  and  the  hybrid  regula  falsi  at  0.1, 
although  the  two  had  not  been  tested  together.  When  we  compare  the 


search  along  d  efficiencies  here  with  those  of  the  same  four 


problems  in  Table  3.7,  we  see  that  nsing  these 
subroutines/tolerances  together  was  indeed  better  than  using  either 
of  them  by  itself. 


Table  3.11  Efficiencies  for  Deep  Cats  Using  Linesearches 
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1 

1 
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Search 

d 
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“8 
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-r 

1 
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1 

1.00 
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1.23 
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1 

Col  2 

1 
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0.96 

0.97 

1 

1 

Col  3 

1 
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1.15 

1 

1 

Col  4 

1 

1.00 

1.45 
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1 

1 

Col  8 

1 

1.00 

1.22 

1.18  • 

1.02 

1 

1 

Dem  lb 

1 

1.00 

1.37 

1.46 

0.97 
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1 

Dem  2 

1 

1.00 

1.47 

1.67 

1.41 

1 

1 

Dem  3 

1 

1.00 

1.26 

1.27 

1.05 

1 

1 

Dem  4a 

1 

1.00 

1.39 

1.29 

1.00 

1 

1 

Dem  5 

1 

1.00 

1.45 

1.57  • 

1.31  * 

1 

Dem  6 

1 

1.00 

1.26 

1.39  • 

1.01 

1 

Dem  7 

1 

1.00 

1.30 

1.06  • 

0.86 

1 

1 

Dem  8a 

1 

1.00 

1.48 

1.36 

1.14 

1 

Avg: 

1 

+- 

1.00 

1.42 

— 

1.39 

h— - - 

1.15 

h- - 

1 

+ 

Notes: 

. .  *  denotes  those  problems  on  which  the  algorithm  did  not 
converge  to  the  optimum  point. 

..  The  average  efficiency  figures  do  not  include  Colville  2, 
Colville  8,  Dembo  5,  Dembo  6,  or  Dem^o  7,  because  of  the 
searches  which  did  not  converge  to  z  . 


The  deep  linesearch  cuts  tested  do  not  increase  algorithm 
accuracy  in  any  systematic  way.  In  fact,  they  occasionally  cause 
convergence  to  a  nonoptimal  point  if  the  problem  is  nonconvez. 


Farther,  they  almost  uniformly  degrade  algorithm  efficiency 
relative  to  center  cats.  The  best  of  the  deep  linesearches  appears 
to  be  the  search  along  xr  -  x^  method.  However,  a  percentage  of 
the  deep  cats  done  by  this  algorithm  are  extended-cats  where  the 
linesearch  is  not  performed,  and  where  efficiency  is  better  than 
center  cats.  Thus  it  is  possible  that  the  relative  efficiency  of 
this  algorithm  on  the  iterations  where  linesearches  were  required 
was  worse  than  that  reflected  above,  to  yield  the  above  efficiency 
when  extended-cut  and  linesearch  iterations  are  averaged. 


PART  4 


CONSTRAINT  EXAMINATION  STRATEGIES 

In  previous  sections  we  have  stated  that  we  determine  whether 
or  not  x^  €  S  and  find  a  violated  constraint  if  4  S,  without 
giving  an  explicit  way  of  doing  so.  Since  the  update  is  of  rank 
one,  only  one  violated  constraint  is  needed.  Ve  could  examine  all 
m  constraints  and  select  the  one  with  the  greatest  violation  (or 
some  similar  criterion) .  Instead,  to  reduce  the  number  of  function 
evaluations  required,  we  use  the  first  constraint  that  is  found  to 
be  violated.  Note  that  this  choice  does  not  produce  a  monotonic 

decrease  in  the  objective  function  values,  because  of  the 

*  k 

feasibility  cuts.  In  fact,  near  x  ,  every  feasibility  cut  moves  x 

in  a  direction  of  increasing  objective  function  values. 

Since  we  cut  using  the  first  constraint  found  violated,  the 
order  in  which  we  examine  the  constraints  in  search  of  a  violated 
one  affects  the  behavior  of  the  algorithm.  Ve  examine  several 
possible  orders  of  search  in  terms  of  their  affects  on  the 
robustness,  accuracy,  efficiency,  and  simplicity  of  the  EA. 

One  general  consideration  motivating  the  work  reported  below 
is  that  it  is  desirable  to  prevent  the  ellipsoids  E^  from  becoming 
highly  aspheric.  It  has  been  reported  that  in  problems  where  the 


catting  hyperplanes  can  take  on  only  certain  orientations,  as 
when  the  are  linear  (for  example,  see  [9]),  the  sometimes  do 
become  highly  aspheric.  Numerically,  this  effect  manifests  itself 
in  ill-conditioning  of  the  matrices  Q^,  and  it  can  prevent  the  EA 
from  obtaining  an  accurate  solution.  Also,  it  has  been 
conjectured,  see  [7],  that  the  robustness  of  the  EA  may  be  due  to 
the  ability  of  the  x*  to  sample  widely-separated  regions  of  the 
problem  space  early  in  the  solution  process.  This  explanation  for 
the  observed  robustness  of  the  EA  is  consistent  with  the  annealing 
theory  of  mathematical  programming  algorithm  robustness  suggested 
in  [11].  If  the  E^  become  highly  aspheric,  the  ellipsoid  centers 
x*  may  not  be  well  distributed  throughout  the  problem  space, 
resulting  in  a  loss  of  robustness.  Thus  it  seems  plausible  that 
constraint  selection  strategies  that  cause  the  hyperplanes  to  be 
parallel  or  to  take  on  a  limited  number  of  different  orientations 
may  decrease  the  accuracy  and  robustness  of  the  EA.  This  argues 
against  repeatedly  cutting  on  a  few  constraints. 

A  second  general  consideration  is  that  it  is  inefficient  to 
examine  constraints  that  are  not  active.  This  argues  for  a 
strategy  that  identifies  the  active  constraints  and  examines  only 
those.  However,  an  active  set  strategy  which  introduces 
significant  complications  in  the  algorithm  would  be  objectionable 


in  view  of  the  inherent  simplicity  of  the  remainder  of  the  EA. 


Table  4.1  is  included  here  because  the  set  of  constraints 


which  are  active  at  optimality  would  be  expected  to  affect  the 
efficiency  of  various  constraint  examination  options. 


Table  4.1  Test  Problems:  Active  Set  of  Constraints  at  x 


■i - 
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1 

15 

1 
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1 
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15 

1 

20 

1 

11 

1 
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3 

5 

1 

16 

1 

5 
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i 
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4 

4 

1 

8 

1 

0 

1 

i 

1  Colville 

8 

3 
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20 

1 

2 

1 

3,18 

i 

1  Dembo  lb 

12 

1 

3 

1 

3 

1 

1,2,3 
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1  Dembo  2 

5 

1 

9 

1 

5 

1 

2, 5, 7, 8, 9 

i 

1  Dembo  3 

7 

1 

15 

1 

6 

1 

1,3,6,7,9,15 

i 

1  Dembo  4a 

8 

1 

4 

1 

4 

1 

1,2, 3, 4 

i 

1  Dembo  5 

8 

1 

6 

1 

6 

1 

1,2, 3. 4, 5, 6 
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1  Dembo  6 

13 

1 

18 

1 

14 
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1.. .9,12,13,15,17,18 
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1  Dembo  7 

16 

1 

25 

1 

22 

1 

1.. .12,14,15,18. ..25 

i 

1  Dembo  8a 

7 

1 

4 

1 

2 

1 

-4- 

2.3 

i 

Notes: 

..  m+  is  the  number  of  constraints  active  at  optimality. 

Throughout  Part  4,  we  use  only  the  center  cuts  explained  in 

C  Q 

Part  1,  where  x  *  x  and  g  =  g^(x  ).  These  cuts  select  to  be 
the  support  hyperplane  to  =  {x  I  f^x)  <,  f^x*)}  (the  level  set 
of  f^  which  passes  through  x^) .  The  statements  in  Part  4 
concerning  the  solution-preserving  properties  of  certain  algorithms 


are  based  on  the  fact  that  these  are  the  cuts  that  are  used. 


In  this  section  we  consider  three  rules  for  deciding  the  order 
in  which  to  examine  the  m  constraint  functions  of  NLP  in  search  of 
a  violated  constraint,  namely  top-down  order,  cyclical  order,  and 
random  order. 

4.1..1.  Top-down  Order 

The  simplest  approach  is  to  always  examine  the  constraint 
functions  in  top-down  order.  Thus,  at  each  iteration  k,  f^(x^)  is 
evaluated  first.  If  f^(x  )  >  0  we  stop  the  search  and  use  f^  in 
constructing  H^;  otherwise,  f^(x^)  is  checked,  and  so  on.  The 
first  index  i  ±  m  for  which  f^x*)  >0  is  the  index  used  in 
constructing  H^,.  Of  course,  it  may  turn  out  that  fi(xk)  2  0  for 
i  =  l...m,  and  then  is  constructed  using  fm+^(x^).  If  the 
selected  constraint  i  is  near  the  top  of  the  list,  it  is  more 
likely  to  be  selected  again  on  subsequent  iterations  when  it  is 
again  violated.  The  disadvantage  of  the  top-down  strategy  lies  in 
the  possibility  that  constraints  near  the  top  of  the  list  will  be 
used  over  and  over  to  the  exclusion  of  other  constraints  that  may 
also  be  violated,  perhaps  decreasing  the  robustness  or  accuracy  of 
the  algorithm.  This  constraint  examination  strategy  was  used  in 
the  EA  implementation  of  [12]. 


4.1.2  Cyclical  Order 

The  next  strategy  we  consider,  cyclical  examination,  attempts 

to  insure  that  violated  constraints  near  the  bottom  of  the  list  are 

not  repeatedly  ignored  in  favor  of  those  near  the  top.  Let  i^  be 

the  index  of  the  last  constraint  function  examined  on  iteration  k. 

The  first  iteration  of  the  cyclical  strategy  is  identical  to  that 

of  the  top-down  strategy.  On  subsequent  iterations  however,  the 

first  constraint  to  be  examined  has  index  p  =  i^_^+  1,  or  p  =  1  if 

i.  ,=  m.  If  f  (x*)  >  0,  we  stop  the  search;  otherwise  the  search 
k-1  p 

continues  with  index  p  +  1  (or  1  if  p  =  m) ,  and  so  on.  If  a 
violated  constraint  is  not  found  within  m  constraint  function 
examinations,  then  x^  €  S  and  a  Phase  2  cut  is  made.  The  cyclical 
strategy  requires  essentially  no  increase  in  algorithm  complexity 
over  the  top-down  strategy.  In  view  of  the  general  considerations 
outlined  in  Part  4,  the  cyclical  strategy  might  be  expected  to 
increase  algorithm  robustness  and  accuracy.  The  ellipsoid 
algorithm  implementation  EA3  uses  a  cyclical  strategy  that  is 
identical  to  the  one  described  above,  except  that  the  first 
constraint  examined  on  iteration  k  +  1  has  index  1  if  is 

constructed  using  fn+^. 

1.1.1  Order 

The  third  strategy  we  consider,  random  constraint  examination, 
is  also  intended  to  prevent  the  ordering  of  the  constraints  in  the 
list  from  causing  some  constraints  to  be  used  to  the  exclusion  of 


others.  Before  the  first  iteration  we  generate  a  set  of  m 


pseudo-random  numbers,  index  them  1  to  m,  and  then  list  the  indices 
in  increasing  order  of  the  associated  random  numbers.  This  list  of 
randomly-ordered  indices  is  used  like  the  list  l...m  in  the 
cyclical  method  above.  A  new  randomized  list  is  constructed  at  the 
end  of  each  iteration  in  which  the  last  index  on  the  current  list 
was  examined.  An  increase  in  algorithm  complexity  and 
computational  effort  is  required  to  accomplish  the  randomization, 
but  algorithm  robustness  and  accuracy  might  be  improved  over  even 
the  cyclical  approach,  because  of  a  further  decrease  in  the 
sensitivity  of  the  algorithm  to  the  initial  ordering  of  the 


constraint  indices 


4.1.4  Experiments  and  Results 

Figures  D4.1  through  D4.13  of  Appendix  D4  show  the  performance 
of  the  cyclical,  top-down,  and  random  constraint  selection 
strategies  when  they  are  applied  to  the  13  test  problems,  and 
Table  4.2  summarizes  the  accuracy  and  efficiency  results. 


Table  4.2  Experimental  Results  for  the  3  Simple  Strategies 
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Notes : 

..  The  solution  accuracy  reported  is  the  log  of  the  lowest  E(x^) 
attained. 

..  The  algorithm  efficiency  reported  is  the  relative  efficiency 
s  defined  in  Part  2,  computed  using  the  cyclical  strategy  as 
variant  A. 


None  of  the  strategies  shows  a  clear  superiority  in  accuracy. 


and  the  three  strategies  are  equally  robust  in  the  sense  that  none 
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of  them  fail  to  solve  any  of  the  problems.  However,  the  relative 


efficiency  of  the  cyclical  strategy  is  clearly  superior  to  that  of 
the  random  strategy,  and  the  relative  efficiency  of  the  random 
strategy  is  in  turn  somewhat  better  than  that  of  the  top-down 
strategy. 

In  view  of  the  general  considerations  discussed  in  4.1,  it  is 
counter-intuitive  that  the  strategies  turn  out  not  to  differ  much 
in  terms  of  accuracy  or  robustness;  we  expected  that  robustness  and 
ultimate  accuracy  would  both  be  improved  by  the  elimination  of 
regular  patterns  in  the  order  of  constraint  examination.  The 
superior  efficiency  of  the  random  strategy  relative  to  the  top-down 
strategy  is  also  counter-intuitive,  in  view  of  the  computational 
effort  that  is  required  to  randomize  the  constraint  indices.  Also, 
it  is  surprising  that  the  top-down  strategy  is  not  better  than  the 
cyclical  strategy  for  Colville  2,  since  8  of  the  active  constraints 
in  that  problem  appear  in  the  top  half  of  the  list  of  indices. 
Finally,  we  expected  that  the  top-down  and  cyclical  strategies 
would  be  about  equally  efficient  on  Dembo  5,  since  that  problem 
(like  Dembo  lb  and  Dembo  4)  has  all  constraints  active  at 
optimality.  These  discrepancies  between  the  experimental  evidence 
and  our  intuitive  preconceptions  serve  to  underscore  the  importance 
of  computational  testing  in  the  evaluation  and  comparison  of  the 


methods 


Because  of  its  superior  efficiency,  the  cyclical  strategy  is 
used  in  the  remainder  of  this  paper  whenever  a  subset  of 
constraints  is  to  be  examined. 

4.1.5,  Cyclical  Examination  Strategy  Statistics 

As  described  in  2.3.1,  the  experimental  software  collected 
numerous  statistics  in  addition  to  the  performance  measurements  of 
error  and  effort.  Since  the  cyclical  constraint  examination 
strategy  will  be  used  now  as  the  standard  for  comparing  other  EA 
variants.  Table  4.3  display  several  statistics  for  the  cyclical 
strategy.  The  statistics  are  the  percentage  of  iterations  where  xk 
is  infeasible,  the  percentage  of  effort  used  to  determine  whether 
xk  is  feasible,  and  the  percentage  of  iterations  that  found  new 


record  points 


Table  4.3  Cyclical  Strategy  Algorithm  Behavior  Statistics 
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76 

22 

1  Dem6  1 

92 

20 

1  Dem7  1 

91 

19 

m 
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74 

24 

Note  the  high  percentage  of  effort  spent  evaluating  the  feasibility 
constraints.  This  was  the  motivating  factor  for  developing  the 
alternative  constraint  examination  strategies  of  4.2  and  4.3. 


4.2  Using  An  Active  S£l  Strategy 


In  addition  to  the  algorithm  behavior  statistics  reported  in 
Table  4.3,  ve  also  counted,  for  each  problem,  the  number  of  times 
that  the  cyclical  constraint  examination  strategy  causes  each 
function  to  be  used  in  constructing  H^.  Tables  4.4  and  4.5  show 
these  statistics  for  the  test  problems  Dembo  8a  and  Dembo  3. 


Table  4.4  Use  of  Functions  in  Constructing  for  Dembo  8a 
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For  Dembo  8a,  m  =  4,  thus  the  indices  1  through  4  are  the 
feasibility  constraints,  and  the  last  column  is  for  the  objective 
function.  The  best  solution  found  for  this  problem  occurred  at 
iteration  k  -  1890,  with  constraints  2  and  3  active  at  optimality. 
Note  that  after  about  200  iterations,  these  are  the  only 
constraints  ever  used  for  making  feasibility  cuts. 


Table  4.5  Use  of  Functions  in  Constructing  for  Dembo  3 
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The  best  solution  found  for  this  problem  occurred  st  iteration 
k  "  2845.  with  constraints  1,  3,  6,  7,  9,  and  15  active  at 

optimality.  Note  that  after  about  100  iterations  only  one  other 
constraint  (i  “  13)  plays  a  role,  and  after  about  700  iterations 
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Hie  statistics  reported  in  Tables  4.4  and  4.5  show  that  as  the 
EA  follows  its  convergence  trajectory  it  generates,  essentially  for 
free,  information  that  can  be  used  to  predict  which  constraints 
will  be  active  at  optimality.  In  this  section  we  develop  an  active 
set  strategy  based  on  this  predictive  capability,  and  show  that  it 
can  sometimes  dramatically  improve  the  efficiency  of  the  EA. 

Recall  that  the  EA  examines  constraints  at  each  iteration  to 
determine  whether  or  not  x^  €  S.  If  x^  S,  one  violated  constraint 
has  been  found  (there  may  be  other  violated  constraints  as  well, 
but  these  are  not  found  because  the  examination  stops  at  the  first 
one).  Suppose  the  algorithm  marks  each  constraint  that  is  found  to 
be  violated,  and  let  I*  be  the  set  of  constraints  that  have  been 
marked  after  some  iterations.  The  set  I+  is  the  current  active  set 
of  constraints,  and  at  some  point  we  could  begin  to  examine  its 
elements  before  those  in  the  inactive  set  I  =  {l...m}  \I+.  This 
idea  is  the  basis  for  our  active  set  strategy.  We  define  the 
feasible  regions  for  I+  and  I  as 

I+  *  0 
I+  =  0 

l‘M 


J  (x  I  fA(x)  i  0,  i  I+), 

l  Rn. 

f  (x  I  f.(x)  10,  i  I"). 


When  examining  constraints  to  determine  if  x  €  S  so  that  the 


correct  feasibility  or  optimality  cut  can  be  made,  we 

examine  the  constraints  in  I*  first.  If  a  violated  constraint  is 

found  in  I*,  so  that  xk  4  X+,  then  xk  $  S  and  we  make  a 

k  +  k 

feasibility  cut.  If  x  €  X  ,  however,  then  x  €  S  if  and  only 
if  x*  €  X  ,  so  whether  a  feasibility  cnt  or  an  optimality  cut  is 

required  depends  on  whether  xk  €  X  .  When  xk  €  X+,  we  can 

assume  either  that  there  are  constraints  in  I  that  will  be 

violated  during  future  iterations,  or  that  there  are  not.  If  we 
suspect  that  there  are  constraints  in  I  that  will  be  violated,  we 
should  test  whether  xk  €  X  ,  but  if  we  believe  that  I+  contains 

all  the  constraints  that  will  ever  be  found  violated,  we  merely 

k  k  - 

assume  that  x  €  S  without  testing  whether  x  €  X  .  We  now 

consider  the  consequences  of  adopting  each  of  these  policies  when 

their  underlying  assuaiptions  are  wrong. 

The  first  policy  assumes  that  some  constraints  in  I  may  be 
violated,  and  so  tests  whether  xk  €  X  .  If  the  assumption  is  wrong 
and  I  contains  no  violated  constraints,  performing  the  test  shows 
xk  €  X  and  some  computational  effort  is  wasted  on  unnecessary 
constraint  examinations.  However,  the  policy  guarantees  that  the 
cuts  made  will  be  solution-preserving  and  that  any  record  points 
that  may  be  found  will  be  S-feasible.  When  the  active  set  is 
changing  rapidly,  it  is  reasonable  to  assume  that  sometimes 
xk  4  X  ,  so  we  use  the  policy  of  checking  I  when  xk  €  X+. 
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The  second  policy  assumes  that  I  contains  no  constraints  that 
will  be  violated*  and  avoids  needless  work  by  not  testing  whether 
xk  €  I  .  When  X*  is  essentially  unchanging,  it  seems  reasonable  to 
assume  that  if  xk  €  X+  then  xk  e  S,  so  we  make  an  optimality  cut 
if  xk  £  X+  and  a  feasibility  cut  if  xk  £  X+.  Further,  we  declare 
xk  to  be  a  record  point  if  xk  €  X+  and  f  .,(xk)  <  fr,  even  though 
we  are  not  certain  that  xk  €  S. 

If  the  assumption  xk  €  X  is  wrong  (I  contains  a  violated 
constraint),  then  under  this  policy  an  optimality  cut  will  be  made 
when  a  feasibility  cut  is  required,  and  such  a  cut  may 
not  be  solution-pre serving.  If  xk  is  declared  to  be  a  new 
record  point  but  xk  $  X  ,  then  the  new  record  point  is 
S-infeasible  and  the  record  value  fr  is  incorrect.  To  distinguish 
between  record  points  that  are  known  to  be  S-feasible  and  those 
that  are  not  known  to  be  S-feasible,  we  call  the  former  true  record 
points  and  the  latter  mavbe-record  points.  Thus  a  record 
point  can  be  either  a  true  record  point  or  a  maybe-record  point. 

In  Section  1  we  showed  that,  without  an  active  set  strategy, 
the  EA  always  preserves  x*  (if  the  f^  are  convex).  An  active  set 
strategy  using  the  policy  of  not  testing  whether  xk  €  X  generates 
cuts  that  may  or  may  not  be  solution-preserving.  If  xk  £  X+  then 
xk  4  S,  and  the  resulting  feasibility  cut  is  solution-preserving 
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for  the  reasons  given  in  Section  1.  If  xk  €  X+  and  fm+^(xk)  >  fr» 
the  optimality  cut  also  preserves  x  (and  G  and  xr)  for  the 
reasons  given  in  Section  1.  In  fact,  an  optimality  cut 
when  x  €  X  and  fB+^(x  )  >  f  is  solution-preserving  even  if 
xk4l  ,  because  the  cut  preserves  G  (and  thus  preserves  x*). 


When  xk  €  X+  and  f  , (xk)  <  fr  so  that  xk  is  a  maybe-record 

m+l 

point,  what  the  optimality  cut  preserves  is 


G+  -  (x  I  £„,(*)  i  f 


+  V  —  V  J; 

and  the  presumed  solution  set  S  fl  G  .  If  x  £  X  ,  then  x  £  S,  x 
is  a  true  record  point,  and  the  cut  is  the  same  solution-preserving 
optimality  cut  described  in  Section  1.  However,  if  xk  4  X  ,  then 
x*  is  neither  feasible  nor  a  true  record  point  and  xk  4  (S  H  G+) . 
In  fact  (S  fl  G+)  =0  if  fn+1(xk)  <  *m+l^x  ^ ’  and  in  that  cas®  tlie 
cut  is  not  solution-preserving.  In  general,  of  course,  we  do  not 
know  the  value  of  f  ^(x  ),  so  if  we  are  not  checking  whether 
xk  €  X  we  have  no  way  of  knowing  for  sure  whether  (S  fl  G+)  *  0. 
Thus,  if  I  contains  active  constraints,  the  policy  of  not 
checking  whether  xk  £  X  can  produce  cuts  that  are  not 

solution-preserving  when  xk  €  X+,  fn+^(xk)  <  fr»  *nd  I  £  0. 


To  detect  the  occurrence  of  non-solution-preserving  cuts 
whenever  it  is  possible  that  one  has  been  made,  we  periodically 


check  the  latest  new  maybe-record  point  xr  for  S-feasibility.  If 

xr  €  X  then  xr  is  a  true  record  point.  Further, 

♦ 

x  £  (S  fl  G)  C  E^,  since  any  cuts  made  after  finding  x  are  sure  to 
have  been  solution-preserving. 


If  xr  $  X  then  we  backtrack  to  the  last  ellipsoid  known  to 
contain  a  nonempty  solution-containing  set.  If  a  true  record  point 
has  been  found,  all  subsequent  cuts  are  solution-preserving  until 

the  next  maybe-record  point  xr  is  found.  We  save  the  ellipsoid 

* 

then,  prior  to  the  cut  which  might  discard  x  .  If  a  backtrack  is 
later  required,  restoring  the  saved  ellipsoid  E^  ensures  that 
x  £  (S  D  G)  C  E^.  Having  backtracked  to  this  earlier  point  in  the 
trajectory,  we  move  the  offending  constraint  from  I  to  I+  and 
start  on  a  new  trajectory.  Backtracking  incurs  a  computational 
penalty,  but  it  will  never  occur  if  I+  and  I  have  been  correctly 
identified. 


The  performance  of  the  algorithm  is  affected  by  when  and  how 
often  the  active  set  strategy  checks  whether  xr  €  X  .  We  make  the 
check  only  at  points  x^  that  are  maybe-record  points,  because  if  x* 
is  to  be  checked,  iterations  after  x*  will  only  add  to  the  length 
of  the  backtrack  if  i  I  .  Further,  we  check  x^  before  updating 

xr  and  fr,  so  that  the  previous  record  point  is  also  available  for 

k  ”  r  1c  k 

testing.  If  x  £  X  we  let  x  =  x  (because  we  know  that  x  is  a 

true  record  point  and  (S  H  G+)?t  0),  and  we  continue  with  an 
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optimality  cut.  If  iM  X  ,  we  test  whether  xr  €  X  and  if  it  is 
then  no  backtrack  is  required  and  a  feasibility  cut  continues  at 
x*.  Otherwise,  we  backtrack  to  the  saved  ellipsoid.  Since  the 
ellipsoid  was  saved  at  an  I+-feasible  point,  we  test  for 
I  -feasibility  and  make  a  feasibility  or  optimality  cut  as 
required. 

Ve  have  now  developed  the  primary  elements  of  our  active  set 
strategy.  Given  an  initial  or  current  estimate  of  the  active  set 

+  j[  -f 

of  constraints  I  ,  we  test  whether  x  €  X  .  Ve  make 

feasibility  or  optimality  cuts  according  to  whether  xk  £  X  or 

x^  €  X+  (instead  of  according  to  whether  x^  £  S  or  x^  €  S)  in 
order  to  avoid  examining  the  constraints  in  I  ,  which  we  think 
are  inactive.  Because  constraints  may  drop  out  of  the 

active  set  as  the  ellipsoids  shrink,  we  periodically  revise  I+  on 
the  basis  of  the  violation  history  of  the  constraints  in  I+. 
If  a  constraint  was  never  found  to  be  violated  since  the  previous 
revision  of  I+,  it  is  dropped  from  I*. 

As  the  algorithm  progresses,  it  may  generate  maybe-record 
points  that  are  I+-feasible.  Some  of  these  record  points  are 
checked  as  they  are  found,  to  see  whether  they  are  I  -feasible  as 
well.  Others  are  not  tested,  to  avoid  constraint  function 

evaluations.  Since  cuts  at  maybe-record  points  can  fail  to 

♦  r 

preserve  x  ,  we  periodically  test  the  current  maybe-record  point  x 


to  see  if  xr  e  X  . 


If  x 


is  not  feasible,  the  algorithm  mast 


backtrack  to  a  point  where  it  was  previously  determined  that 
(S  fl  G) ^  0.  When  the  inactive  set  I  is  tested  and  one  of  its 
constraints  is  found  to  be  violated,  that  constraint  is  added  to 
I+. 


We  start  with  I+  empty,  and  add  a  constraint  to  I+  only  when 
all  of  the  current  set  of  active  constraints  are  satisfied.  This 
process  does  not  add  to  I+  any  constraint  which  is  redundant  in  the 
sense  that  it  is  not  needed  to  show  infeasibility.  Thus  the  active 
set  I*  we  construct  is  minimally  sufficient  in  that  constraints  are 
added  only  when  the  existing  set  does  not  suffice  to  show 
infeasibility.  This  increases  the  efficiency  of  the  active  set 
algorithm  because  if  redundant  constraints  were  added  they  would 
have  to  be  examined  when  I+  is  tested  on  every  iteration. 

The  two  processes  of  dropping  from  and  adding  to  I*  proceed 
simultaneously.  Checking  for  inactive  constraints  to  drop  does  not 
increase  the  complexity  of  the  algorithm  very  much,  and  requires 
only  a  small  amount  of  extra  computational  effort.  Checking  the 
constraints  in  I~  and  reassigning  violated  ones  to  I*  is  also 
simple  if  all  maybe-record  points  are  tested  for  I  -feasibility. 
Otherwise,  the  need  to  save  ellipsoid  data  and  backtrack  when 


necessary  adds  significantly  to  the  complexity  of  the  algorithm; 
however,  large  gains  in  efficiency  might  be  realized  by  avoiding 


some  of  the  tests. 


How  often  we  check  whether  constraints  can  be  dropped  from  I 
(a  drop-check) .  and  how  often  we  check  whether  constraints  mast  be 
added  to  I+  (an  add-check) .  are  parameters  that  can  be  varied  to 
control  the  behavior  of  the  algorithm. 

Consider  the  influence  on  algorithm  behavior  of  the  drop— check 
interval,  which  we  name  Ak  .  If  Ak  is  too  small,  a  truly  active 
constraint  may  appear  to  be  inactive.  Only  one  violated  constraint 
can  be  identified  per  Phase  1  iteration,  and  therefore  if  too  few 
iterations  occur  between  drop-checks  some  violated  constraints  may 
be  overlooked.  On  the  other  hand,  if  Ak  is  overly  large,  then 
constraints  that  are  truly  inactive  may  be  retained  in  I+  longer 
than  necessary,  causing  effort  to  be  wasted  in  needless  function 
evaluations  whenever  it  is  necessary  to  test  whether  x*  €  X+. 

To  set  a  plausible  lower  bound  on  Ak  ,  we  treat  the 
algorithm's  discovery  of  violated  constraints  as  a  random  process. 
Ve  model  the  discovery  of  constraint  violations  by  a  multinomial 
distribution,  assuming  that,  on  each  Phase  1  iteration,  each  active 
constraint  is  equally  likely  to  be  found  violated.  Then,  assuming 
that  all  of  the  constraints  in  I+  are  active,  we  calculate  Ak  as 
the  smallest  number  of  Phase  1  iterations  sufficient  to  make  the 
probability  >  .99  that  each  active  constraint  i.as  been  found  to  be 
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violated  at  least  once.  Farther  details  about  this  probability 
model  and  its  nse  in  setting  a  minimum  value  for  Ak  are  given  in 
Appendix  B. 


The  lower  bound  for  Ak  is  used  when  I  is  changing,  such  as 
on  initialization  and  when  a  constraint  has  just  been  dropped  from 
or  added  to  I+.  When  I*  is  unchanging,  we  increase  Ak  to  reduce 
the  frequency  of  drop-checks  and  thus  the  computational  effort 
expended  in  examining  and  reinitializing  the  vector  of  constraint 
violation  histories.  This  increasing  of  Ak  is  accomplished 
automatically,  by  doubling  Ak  whenever  a  drop-check  finds  that  all 
of  the  constraints  in  I+  are  still  active. 


Now  consider  the  add-check  interval,  Ak  ,  the  number  of 
maybe-record  points  that  must  occur  before  the  most  recent  one  is 
checked  for  I  -feasibility.  The  minimum  value  of  Ak+  is  1,  when 
every  maybe-record  point  is  to  be  checked.  Every  other  maybe-record 
point  is  checked  if  Ak+  =  2,  and  so  on.  If  Ak+  is  too  small,  then 
I  -feasibility  is  checked  frequently,  so  that  effort  is  wasted  if 
I  has  been  properly  identified.  On  the  other  hand,  if  I  contains 
a  constraint  that  will  be  active,  an  overly  large  Ak+  increases 
both  the  likelihood  that  backtracking  will  be  needed  and  the  length 
of  the  backtrack  if  one  is  needed. 


When  there  is  reason  to  suspect  that  I  has  not  yet  been 


if  k  <  Ak 
then  k  =  k  +1 
go  to  3 

else  set  k  =0 

check  constraint  violations  in  last  Ak  iterations 
if  all  constraints  in  I*  were  found  to  be  violated 
then  Ak"  =  2Ak“ 
go  to  3 

if  any  constraints  in  I*  were  never  found  violated 
then  drop  those  constraints  from  I+ 
set  Ak+  =  1 

set  Ak  at  its  lover  bound 
set  k+  =  0 

determine  if  xk  is  a  mavbe-record  point; 
if  xk  i  X+ 

then  let  i  be  the  index  of  the  violated  constraint 
go  to  5 

else  if  fm|^(xk)  <  fr 
then  go  to  4 


else  let  i  =  m  +  1 
go  to  5 
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4.)  xk  is  a  pavbc-rccord  point: 
if  k+  <  Ak+ 

then  save  if  required 
let  i  =  m  +  1 
go  to  5 

else  if  xk  €  X 

then  set  k+  =  0 

increase  Ak+  (see  Appendix  C) 
let  i  =  a  +  1 
go  to  5 

else  backtrack  if  required 

add  violated  I  -constraints  to  I+ 
set  Ak+  ■  1 

set  Ak  at  its  lower  bound 
set  k-  ■  0 

3.)  finish  this  EA  iteration: 

nake  the  cut  using  f^  to  create 
update  Q  and  x 

if  convergence  has  not  occurred 
then  go  to  2 
else  if  xr  4  X 

then  backtrack 
else  stop 


s 


Figures  DS.l  -  D5.13  of  Appendix  DS  contain  error  versus 
effort  curves  coaparing  the  active  set  strategy  to  the  cyclical 
constraint  examination  strategy  of  4.2.1  on  each  of  the  13  test 
problems,  and  the  results  are  summarized  in  Table  4.6. 

Table  4.6  Experimental  Results  for  the  Active  Set  Strategy 


accuracy 


Cyclic  I  Active 


efficiency 

- 1 - 

Cyclic  I  Active 


%  of  iters  i 
to  f ind+  I 
stable  1  I 


Col 

1  I 

-16.83 

1  -16.10 

1 

1 

Col 

2  I 

-14.36 

1  -14.33 

1 

1 

Col 

3  1 

-15.11 

1  -14.92 

1 

1 

Col 

4  1 

-16.58 

1  -16.58 

1 

1 

Col 

8  1 

-14.99 

1  -15.85 

1 

1 

Dem 

lb  I 

-8.30 

1  -8.87 

1 

1 

Dem 

2  I 

-14.48 

1  -14.53 

1 

1 

Dem 

3  1 

-14.38 

1  -14.41 

1 

1 

Dem 

4a  1 

-15.55 

1  -15.70 

1 

1 

Dem 

5  1 

-15.08 

1  -14.63 

1 

1 

Dem 

6  1 

-17.57 

1  -17.57 

1 

1 

Dem 

7  1 

-13.36 

1  -12.35 

1 

1 

Dem 

8a  1 

-14.64 

1  -14.71 

1 

1 

1 

0.73  I 

4.5 

1 

0.98  I 

71.1 

1 

0.75  1 

1.5 

1 

0.66  I 

0 

1 

0.43  1 

5.8 

1 

1.01  1 

3.3 

1 

0.83  1 

1.4 

1 

0.83  | 

38.9 

1 

1.01  1 

0.4 

1 

1.00  I 

1.9 

1 

1.00  I 

25.7 

1 

1.04  ! 

75.7 

1 

0.91  1 

21.7 

average  efficiency:  I  1.00  I  0.86 


Notes : 

. .  The  solution  accuracy  reported  is  the  log  of  the  lowest  E(xk) 
attained. 

. .  The  algorithm  efficiency  reported  is  the  relative  efficiency 
s  defined  in  2.2,  computed  using  the  cyclical  strategy  as 
variant  A. 


Using  the  active  set  strategy  causes  essentially  no  change  in 


algorithm  accuracy,  m  /m  is  the  ratio  of  the  number  of  constraints 


active  at  optimality  to  the  total  number  of  constraints.  Ve 
expected  the  active  set  strategy  to  improve  the  efficiency  of  the 
algorithm  most  on  problems  for  which  m+/m  is  small,  little  on 
problems  for  which  m+/m  is  close  to  1,  and  not  at  all  on  problems 
for  which  all  the  constraints  are  active  at  optimality. 

Table  4.7  repeats  the  efficiency  results  given  above,  with  the 
problems  rearranged  in  increasing  order  of  m+/m.  A  column  is  added 
to  show  the  best  possible  efficiencies  the  active  set  strategy 
could  have  achieved.  This  value  is  the  efficiency  that  would 
result  if  the  active  set  strategy  used  only  100(m+/m)%  of  the 
cyclical  feasibility-checking  effort  from  Table  4.3.  For  example, 
Colville  1  has  m+/m  =  .27  and  the  cyclical  strategy  spent  41%  of 
its  effort  evaluating  the  feasibility  constraints.  The  best 
possible  effort  expenditure  then  has  two  components.  All  the  other 
algorithm  steps  except  feasibility  checks  still  use  59%  of  the 
original  algorithm  time.  Feasibility  checks  could  be  reduced  to 
.27(41%)  *  11%  of  the  original  algorithm  time.  Thus,  the  active 
set  strategy  could  use  as  little  as  59%  +  11%  -  70%  of  the  cyclical 


strategy  effort 


Table  4.7  Active  Set  Efficiency  vs  m  /m 
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1  1 

i 

Actual 

Possible 

— I- 

1 

1  prob  1 

m+/m 

i 

Efficiency 

Efficiency 

1 

1  Col  4  1 

.00 

i 

.66 

.61 

1 

1  Col  8  I 

.10 

i 

.43 

.33 

1 

1  Col  1  I 

.27 

i 

.73 

.70 

1 

1  Col  3  1 

.31 

i 

.75 

.67 

1 

1  Dem  3  ! 

.40 

i 

.83 

.76 

1 

1  Dem  8a  I 

.50 

i 

.91 

.88 

1 

1  Col  2  I 

.55 

i 

.98 

.95 

1 

1  Dem  2  1 

.56 

i 

.83 

.82 

1 

1  Dem  6  1 

.78 

i 

.96 

1 

1  Dem  7  1 

.88 

i 

.98 

1 

1  Dem  5  1 

BP  W*!1  M 

i 

BfiBS  IE l  Mjgfjj. 

1.00 

1 

1  Dem  lb  I 

i 

1.01 

1.00 

1 

1  Dem  4a  I 

i 

1.01 

1.00 

1 

It  is  encouraging  to  note  that  the  efficiency  of  the  active 
set  strategy  vas  uniformly  close  to  the  best  possible  value.  The 
differences  are  attributable  to  several  factors.  First,  the  best 
possible  efficiency  figure  assumes  that  the  inactive  constraints 
are  never  evaluated.  The  active  set  strategy  though  must  actually 
check  these  constraints  when  testing  new  maybe-record  points. 
Second,  constraints  which  are  inactive  at  optimality  may  be  in  the 
active  set  early  in  the  trajectory  before  being  dropped  from  I*. 
Finally,  there  is  a  slightly  increased  overhead  for  the  active  set 
strategy  algorithm. 


The  experimental  results  thus  confirm  our  expectations,  and 
imply  that  the  active  set  strategy  is  not  much  help  on  problems 


where  more  than  about  3/4  of  the  constraints  are  active  at 
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optimality.  Of  course,  the  strategy  might  work  very  well  even 
where  only  one  or  a  few  constraint  functions  were  inactive  at 
optimality  but  were  also  dramatically  more  difficult  to  evaluate 
than  the  others. 

As  the  iterations  of  the  algorithm  progress,  the  active  set 
strategy's  current  estimate  of  m+,  which  we  call  m+,  starts  at  zero 
and  is  then  adjusted  repeatedly  until  a  sufficient  active  set  has 

been  identified.  On  problems  for  which  more  constraints  are  active 

* 

at  optimality  than  are  required  to  uniquely  determine  z  ,  the 
active  set  strategy  typically  omits  the  extra,  unneeded  constraints 
from  I+.  This  phenomenon  is  illustrated  on  test  problem  Dembo  6, 
for  which  m+/m  =  .67,  and  on  test  problem  Dembo  7,  for  which 
m+/m  =  .76  . 


Figures  4.1  through  4.13  show  the  variation  of  m+/m  with 
iteration  number  for  each  of  the  problems.  Dashed  horizontal  lines 
are  drawn  on  the  graphs  for  Dembo  6  and  Dembo  7  at  ordinate  values 
corresponding  to  m+/m  for  those  problems.  On  Colville  4  no 
constraints  are  ever  found  to  be  violated,  so  m+  =  0  for  the  entire 


solution  process 


§J- 

T.OO 


<  4 .  I  I - 

400.00  800.00  1200.00  1600.00 

Iterations 


— » 

2000.00 
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Figure  4.10  (»+)/*  vs  k:  Deabo  5 


Table  4.4  displayed  the  function  indices  used  to  create  for 
Dembo  8a,  using  the  cyclical  constraint  examination  strategy. 
Table  4.8  displays  the  same  information,  using  the  active  set 
strategy. 


Table  4.8  Use  of  Functions  in  Constructing  for  Dembo  8a 
When  Active  Set  Strategy  is  Used 


1 

1 

1 

1 

1 

1 

'  —  ------ ■■■—  T 

number  of  times  function  f.  was  used 
in  constructing 

1 

1 

interval  oi  + 
iterations  I 

Feasibility 

constraints 

1 

1 

i  =  1 

i  i  =  2 

i  i  -  3 

i  =  4 

i  i.5 

1 

1 

1-  100  1 

1 

1  32 

1  39 

0 

1  21 

1 

1 

101-  200  I 

5 

1  34 

1  36 

0 

1  18 

1 

1 

201-  300  j 

0 

1  34 

1  38 

0 

1  18 

1 

1 

301-  400  I 

0 

1  35 

1  38 

0 

1  18 

1 

1 

401-  500  | 

0 

1  37 

1  37 

0 

1  14 

1 

1 

501-  600  ! 

0 

1  35 

1  39 

0 

1  15 

1 

1 

601-700  1 

0 

1  37 

1  37 

0 

1  18 

1 

1 

701-  800  I 

0 

1  33 

1  39 

0 

1  15 

1 

1 

801-  900  j 

0 

1  34 

1  40 

0 

1  18 

1 

1 

901-1000  j 

0 

1  36 

I  38 

0 

1  17 

1 

1 

1001-1100  ! 

0 

1  36 

I  37 

0 

1  16 

1 

1 

1101-1200  1 

0 

I  37 

I  37 

0 

1  14 

1 

1 

1201-1300  I 

0 

!  36 

I  37 

0 

1  17 

1 

1 

1301-1400  j 

0 

1  36 

1  38 

0 

1  15 

1 

1 

1401-1500  I 

0 

1  36 

t  36 

0 

1  17 

1 

1 

1501-1600  I 

0 

I  35 

I  40 

0 

1  21 

1 

1 

1601-1700  j 

0 

1  34 

1  39 

0 

1  21 

1 

1 

1701-1800  j 

0 

1  35 

1  38 

0 

1  21 

1 

1 

1801-1900  I 

0 

1  36 

- 

I  38 
-1  .  ■ 

0 

1— — — 

1  18 
- 

1 

-4- 

Note  that  the  active  set  did  not  need  to  cut  on  the  fourth 


constraint,  which  the  cyclical  strategy  had  found  violated  once 
during  the  first  100  iterations. 


It  is  interesting  to  note  when  the  final  active  set  was 
identified  in  terns  of  cnts  being  made,  for  the  active  set  strategy 
versus  the  cyclical  strategy.  Table  4.6  stated  that  the  active  set 
strategy  did  not  have  I+  stable  until  22%  of  the  iterations  had 
passed,  although  the  cyclical  strategy  information  of  Table  4.4 
indicates  that  constraints  were  not  needed  after  the  first  200 
iterations  (perhaps,  10%). 

Table  4.8  demonstrates  that  the  active  set  strategy  is  no 
slower  than  the  cyclical  strategy  in  ceasing  to  make  cuts  on 
inactive  constraints.  However,  the  active  set  strategy  did  keep 
the  first  constraint  on  the  active  list  until  iteration  400, 
because  of  its  arrangement  of  the  drop-check  intervals. 


4.3.  Examining  a.  Record  Constraint 


In  Part  1  we  defined  a  solution-preserving  cnt  as  one  which 
ensures  that  x  €  E^+^,  under  the  hypotheses  of  NLP  and  convexity 
of  fi...f  +i»  We  demonstrated  that  feasibility  cuts  are 
solution-preserving  if  C  and  that  optimality  cuts  are 
solution-preserving  if  G  C  H^.  For  center  cuts,  this  means  that 
feasibility  cuts  when  xk  £  S  and  optimality  cuts  when  xk  €  S  are 
solution-preserving.  All  cuts  discussed  here  are  center  cuts  where 
x  =  x  and  g  *■  g  (x  ) . 

In  4.2  we  demonstrated  that  it  is  sometimes  possible  for  an 
optimality  cut  to  be  solution-preserving  even  when  x  €  S'.  In  this 
section,  we  specify  when  such  cuts  are  solution-preserving  and 
analyze  a  constraint  examination  strategy  using  such  cuts. 


In  general,  an  optimality  cut  is  solution-preserving  if 
G  C  H^.  For  center  cuts,  H^.  supports 


Ln+1  =  (x  I  f^U)  <  fm+1(xk)). 


m+1 


m+1 


Thus,  (center)  optimality  cuts  are  solution-preserving  if  G  C  L  ,, 

m+1 

that  is,  if  xk  €  G'.  Optimality  cuts  when  xk  €  G'  are  solution¬ 
preserving  regardless  of  whether  xk €  S. 
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To  determine  whether  x  €  G  we  test  whether  f  (x  )  <.  f  .  In 

m+l 

effect,  the  record  valne  gives  rise  to  a  new  constraint, 
f  ^(x)  i.  f*»  which  we  call  the  record  constraint . 

This  record  constraint  has  a  desirable  property.  As  new 
record  values  are  found,  the  record  constraint  becomes  more 
restrictive  and  is  therefore  always  in  the  active  set  of 
constraints.  Because  the  record  constraint  is  always  active,  it 
seems  plausible  that  examining  it  before  examining  the  feasibility 
constraints  may  speed  EA  convergence.  Below,  we  analyze  a  strategy 
which  tests  the  record  constraint  first. 

4.3..JL  The  Record-first  Strategy 

In  this  strategy,  we  first  test  to  see  whether  x*  6  G.  If 

xk  €  G',  we  make  an  optimality  cut.  If  not,  we  test  f, ...f  to  see 

1  ID 

if  x*  €  S.  If  xk  6  S',  we  make  a  feasibility  cut.  If  xk  €  (S  H  G) , 
a  new  record  point,  then  we  update  fr  and  make  an  optimality  cut. 
We  call  this  the  record-first  strategy. 

We  call  the  EA  presented  in  Part  1  a  feasibility-first 
strategy  since  it  first  determines  whether  x*  €  S.  If  x* e  S,  it 
then  evaluates  fn+j(x^)  to  determine  whether  x*  is  a  record  point. 
The  feasibility-first  strategy  thus  categorizes  x*  as  being  in  S', 
or  in  S  H  G’,  or  in  S  D  G. 
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In  Table  4.9,  note  that  x  is  a  member  of  one  of  the  regions 
S'  H  G',  S'  D  G,  S  H  G',  or  S  H  G.  Depending  on  which  region  x^  is 
in,  the  two  strategies  vary  as  to  which  set  x^  is  classified  as 
being  in,  which  cut  is  made,  and  how  many  function  evaluations  are 
required  to  make  the  classification  and  cut.  Suppose,  for  example, 
that  x*  €  (S  D  G').  The  feasibility-first  strategy  would  first 

V 

test  whether  x  €  S,  and  evaluate  all  m  functions  doing  so  since  x 
is  feasible.  Then,  f  ,,  is  evaluated,  so  m  +  1  function 

evaluations  were  required  before  the  feasibility-first  strategy 
makes  its  cut.  On  the  other  hand,  the  record-first  strategy  would 
first  evaluate  f  Since  the  record  constraint  is  violated  here, 

the  cut  can  be  made  after  only  1  function  evaluation.  Entries  such 
as  l...m  mean  that  a  violated  feasibility  constraint  may  be  found 
as  the  first  function  evaluated,  or  not  until  the  last  one  is 
checked. 
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Table  4.9  Comparison  of  Feasibility-  and  Record-First  Strategics 


If  is  a  member  of... 


Feasibility-first : 

H - 

1  s'  n  g' 

-t- 

1 

s'  n  g 

— t— 

1 

s  n  G' 

s  n  g 

— r 

1 

1 

1 

x*  classified  in 

1  S' 

1 

1 

1 

S' 

1 

1 

s  n  g' 

s  n  g 

1 

1 

1 

1 

cut  made 

1  Feas. 

1 

1 

1 

Feas . 

1 

1 

Optim. 

Optim. 

1 

1 

1 

1 

function  evaluations 
required  to  cut 

1  l...m 

1 

1 

1 

1  •  •  • ID 

1 

1 

m  +  1 

m  +  1 

1 

1 

If 

k  . 

c  is  a 

member  of. . 

Record-first : 

1  s'  n  g' 

1 

s'  n  g 

1 

s  n  g' 

s  n  g 

1 

1 

1 

xk  classified  in 

i  G* 

1 

1 

1 

s'  n  g 

1 

1 

G' 

s  n  g 

1 

1 

1 

1 

cut  made 

1  Optim. 

1 

1 

Feas . 

1 

1 

Optim. 

Optim. 

1 

1 

1 

1 

+- 

function  evaluations 
required  to  cut 

1  1 

1 

H - 

1 

1 

4- 

1 

■ 

m  +  1 

1 

1 

Note  that  the  strategies  differ  in  whether  a  feasibility  or  an 
optimality  cnt  is  made  only  if  x^  €  (S’  H  G').  We  do  not  know 
whether  one  cnt  is  better  than  the  other,  when  both  can  be  made. 
Our  policy  of  cutting  on  the  first  violated  constraint  does  not 
permit  strategies  where  both  record-  and  feasibility-constraints 
can  be  simultaneously  found  to  be  violated. 

There  are  considerable  differences  between  strategies  in  the 
number  of  function  evaluations  required  before  a  cut  is  made.  The 


record-first  strategy  saves  up  to  m  -  1  evaluations  if  x  is  in 
S'  H  G',  saves  exactly  m  evaluations  when  x^  is  in  S  D  G',  and 
requires  only  one  more  evaluation  when  x  is  in  S'  H  G.  How  this 
affects  our  experimental  results  will  depend  on  the  percentage  of 
centerpoints  in  each  region,  and  on  how  early  a  violated  constraint 
is  found  when  searching  the  m  feasibility  constraints. 


s 
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We  do  not  incorporate  an  active  set  strategy  for  the 
feasibility  constraints  so  that  we  can  more  easily  isolate  the 
improvement  dne  solely  to  nse  of  the  record  constraint.  The 
feasibility-first  strategy  used  for  comparison  purposes  will  be  the 
cyclical  method  of  4.1. 

Figures  D6.1  through  D6.13  of  Appendix  D6  display  the 
error-versus-ef fort  curves  for  the  13  test  problems,  comparing  the 
record-first  strategy  to  the  feasibility-first  strategy. 
Table  4.10  summarizes  the  accuracy  and  efficiency  findings. 


Table  4.10  Experimental  Results  for  the  Record-First  Strategy 


+- 

1 

1 

1 

prob 

H - 

1  accuracy 

-4— 

1 

efficiency 

— + 

1 

1  Feas  1 

Record 

1 

Feas  1 

Record 

1 

1 

Col 

1 

1  -16.83  1 

-16.60 

1 

1.00  1 

0.89 

1 

1 

Col 

2 

1  -14.36  I 

-14.33 

1 

1.00  1 

0.97 

1 

1 

Col 

3 

1  -15.11  1 

-14.98 

1 

1.00  1 

1.01 

1 

1 

Col 

4 

1  -16.58  I 

-16.58 

1 

1.00  1 

0.66 

1 

1 

Col 

8 

1  -14.99  I 

-15.85 

1 

1.00  I 

0.76 

1 

1 

Dem 

lb 

1  -8.30  1 

-9.28 

1 

1.00  1 

0.99 

1 

1 

Dem 

2 

1  -14.48  1 

-14.43 

1 

1.00  I 

1.01 

1 

1 

Dem 

3 

1  -14.38  1 

-14.46 

1 

1.00  1 

1.00 

1 

1 

Dem 

4a 

1  -15.55  1 

-15.40 

1 

1.00  1 

1.07 

1 

1 

Dem 

5 

1  -15.08  I 

-14.56 

1 

1.00  1 

0.96 

1 

1 

Dem 

6 

1  -17.57  1 

-17.50 

1 

1.00  1 

0.98 

1 

1 

Dem 

7 

1  -13.36  I 

-13.31 

1 

1.00  1 

1.05 

1 

1 

+- 

Dem 

8a 

1  -14.64  1 

H - 1- 

-15.24 

1 

— t— 

1.00  1 
- L 

1.01 

1 

— f 

Averaged  efficiency:  1  1.00  I  0.95  I 

^ - 1 - h 


Notes : 


. .  The  measure  of  accuracy  used  is  the  lowest  log 
relative  combined  solution  error  attained. 

..  The  measure  of  efficiency  used  is  PSCPU  time 
relative  to  PSCPU  time  used  by  the  feasibility-first 
(cyclical)  strategy. 

The  new  strategy  does  not  degrade  the  accuracy  of  the 
algorithm.  The  efficiency  is  slightly  degraded  on  Dembo  4a  and 
Dembo  7.  The  most  noticeable  improvements  in  efficiency  are  on 


Colville  4.  Colville  8.  and  Colville  1. 
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DISCUSSION  AND  CONCLUSIONS 


Previously,  the  ellipsoid  algorithm  had  been  shown  to 
correctly  solve  a  large  number  of  nonlinear  programming  problems 
(both  convex  and  nonconvex)  and  to  do  so  with  effort  which  is 
coaqpetitive  to  other  solution  techniques.  In  this  study,  we 
analyzed  variants  of  the  EA  to  determine  whether  we  could  improve 
the  accuracy  or  efficiency  with  which  it  solved  a  set  of  13  test 
problems.  There  were  two  main  types  of  variants,  those  which 
created  deep  cut  hyperplanes,  and  those  which  determined  the  order 
in  which  the  feasibility  constraints  were  examined. 

Five  deep  cut  EA  variants  were  tested  which  did  not  require 
using  a  linesearch  to  create  H^,  and  three  deep  cut  variants  were 
tested  which  did  require  a  linesearch.  Before  conducting  the 
linesearch  experiments,  various  linesearch  subroutines  and 
tolerances  were  tested  so  that  the  deep  cut  variants  used  the  most 
efficient  combinations.  None  of  the  deep  cut  variants  increase  the 
accuracy  of  the  algorithm  in  a  systematic  way.  Therefore,  the 
merit  of  each  will  be  based  on  the  efficiency  relative  to  center 
cuts,  and  the  number  of  problems  on  which  it  did  not  converge  to 
the  optimal  point.  Using  these  measures  of  merit,  two  of  the 
nonlinesearch  and  all  of  the  linesearch  deep  cuts  are  not 
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competitive  with  center  cats.  These  cats,  saper-cuts  using  local 
data,  Kelley-cuts  using  local  data,  and  the  searches  along  d,  -g, 
and  xr  -  x^,  degrade  algorithm  efficiency  and  sometimes  converge  to 
nonoptimal  points. 

Five  EA  variants  were  tested  which  examine  the  feasibility 
constraints  in  different  ways.  Three  variants  involve  different 
ways  to  examine  the  entire  set  of  feasibility  constraints.  One 
variant  uses  an  active  set  strategy  to  minimize  function 
evaluations.  The  final  variant  here  uses  the  record  objective 
function  value  to  impose  a  constraint  which  can  be  examined  before 
examining  the  feasibility  constraints.  Determining  merit  among 
these  five  variants  is  easy,  since  accuracy  can  be  eliminated  as  a 
criterion  (all  five  examination  strategies  attained  essentially 
identical  accuracies).  Thus,  the  only  measure  of  merit  remaining 
is  efficiency.  The  cyclical  strategy  was  found  to  be  more 
efficient  than  the  top-down  strategy  of  [12]  or  the  random 
strategy.  Both  the  active  set  and  the  record-first  strategies  had 
better  efficiencies  than  the  cyclical  (feasibility-first)  strategy. 

Thus,  three  deep  cut  and  two  constraint  examination  variants 
improved  the  EA  efficiency.  Table  5.1  compares  the  accuracies  of 
these  five  improved  variants  on  the  test  problems,  and  Table  5.2 
compares  the  efficiencies  relative  to  the  cyclical,  feasibility- 
first  strategy  using  center  cuts. 


Table  5.1  Summary  of  Accuracies  for  tbe  Improved  EA  Variants 


I  problem 


I  Col  1 
I  Col  2 
I  Col  3 
I  Col  4 
I  Col  8 
I  Dem  lb 
I  Dem  2 
I  Dem  3 
I  Dem  4a 
I  Dem  5 
I  Dem  6 
I  Dem  7 
I  Dem  8a 


Center 

Cyclic 


-16.83 

-14.36 

-15.11 

-16.58 

-14.99 

-8.30 

-14.48 

-14.38 

-15.55 

-15.08 

-17.57 

-13.36 

-14.64 


Super 

(center) 


-16.50 

-15.14 

-14.96 

-3.61* 

-15.85 

-8.93 

-14.40 

-14.41 

-15.49 

-14.74 

-17.52 

-13.42 

-15.89 


Extend 


-16.28 

-14.61 

-14.97 

-16.58 

-15.85 

-8.30 

-14.42 

-14.35 

-15.04 

-14.67 

-17.54 

-13.42 

-15.34 


Kelley 

(center) 


-16.88 

-14.11 

-14.89 

(-16.58) 

-14.97 

-8.72 

-14.42 

-14.32 

-15.33 

-14.40 

-4.05* 

-9.05* 

-14.66 


Active 


-16.10 

-14.35 

-14.92 

-16.58 

-15.85 

-8.87 

-14.53 

-14.41 

-15.81 

-14.63 

-17.57 

-12.35 

-14.71 


Record 


-16.60 

-14.35 

-14.98 

-16.58 

-15.85 

-9.28 

-14.43 

-14.46 

-15.45 

-14.56 

-17.50 

-13.31 

-15.24 


Tbe  measure  of  accuracy  used  is  the  lowest  log  relative 
combined  solution  error  attained. 


*  denotes  those  problems  which  did  not  converge  to  x  . 
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Table  5.2  Summary  of  Efficiencies  for  the  Improved  EA  Variants 


H - 1- 

1  1 

1  problem  I 

— 

Center 

Cyclic 

Super 

(center) 

1 

1 

Extend 

Kelley 

(center) 

■+- 

1 

1 

Active 

— 

Record 

~f 

1 

1 

1 

Col 

1 

i 

1.00 

0.83 

1 

0.82 

0.92 

1 

0.71 

0.86 

1 

1 

Col 

2 

1 

1.00 

0.87 

1 

0.97 

0.93 

1 

0.98 

0.96 

1 

1 

Col 

3 

1 

1.00 

1.09 

1 

1.06 

0.82 

1 

0.73 

0.98 

1 

1 

Col 

4 

1 

1.00 

1.07* 

1 

0.99 

(0.99) 

1 

0.67 

0.65 

1 

1 

Col 

8 

1 

1.00 

0.87 

1 

0.90 

1.01 

1 

0.63 

0.79 

1 

1 

Dem 

lb 

1 

1.00 

0.93 

1 

0.93 

0.78 

1 

0.97 

0.95 

1 

1 

Dem 

2 

1 

1.00 

0.91 

1 

0.97 

1.06 

1 

0.85 

0.98 

1 

1 

Dem 

3 

1 

1.00 

0.99 

1 

0.96 

0.88 

1 

0.83 

0.99 

1 

1 

Dem 

4a 

1 

1.00 

0.93 

1 

0.87 

0.80 

1 

1.00 

1.06 

1 

1 

Dem 

5 

1 

1.00 

0.89 

1 

0.94 

0.86 

1 

0.98 

0.95 

1 

1 

Dem 

6 

1 

1.00 

0.96 

1 

0.90 

0.29* 

1 

0.99 

0.98 

1 

1 

Dem 

7 

1 

1.00 

0.93 

1 

0.90 

0.75* 

1 

1.04 

1.05 

1 

1 

Dem 

8a 

1 

1.00 

0.92 

1 

0.93 

0.83 

1 

0.93 

1.00 

1 

Avg: 

1 

+- 

1.00 

— 

0.93 

1 

■+- 

0.93 

0.89 

1 

■+- 

0.87 

0.95 

— 

i 

-+ 

Notes : 

..  The  measure  of  efficiency  used  is  PSCPU  time  relative  to 
PSCPU  time  used  by  the  cyclical  strategy  using  center  cuts. 

. .  *  denotes  those  problems  which  did  not  converge  to  z  . 


When  examining  these  tables  to  determine  which  EA  variant  to 
implement,  the  first  question  is  whether  the  problem  or  subproblem 
is  known  to  be  convex  or  linear.  If  not,  perhaps  center  cuts 
should  be  used  in  preference  to  super-  and  Kelley-cuts.  The 
remaining  discussion  assumes  that  super-  and  Kelley-cuts  have  not 
been  eliminated  from  consideration. 


The  five  improved  variants  do  not  have  to  be  compared  against 
one  another  because  they  are  not  mutually  exclusive  to  use.  It  is 


true  that  either  an  eztended-cnt  or  a  super-cat  must  be  chosen  when 


an  optimality  cut  is  desired.  Extended  cuts  might  be  the  first 
preference  because  they  have  equal  efficiency  with  no  accuracy 
degradation.  Recall  that  extended-cuts  often  have  an  orientation 
which  would  cause  a  negative  depth  of  cut,  and  so  an  alternative 
optimality  cut  must  be  chosen.  Ve  selected  center  cuts  for  this 
analysis,  but  super-cuts  could  be  used  instead  to  gain  the  benefits 
of  using  both  improved  optimality  cuts. 

A  single  EA  variant  can  thus  combine  all  of  the  five  variants 

which  improve  efficiency.  In  particular,  the  record-constraint  is 

examined  first.  If  the  record  constraint  is  satisfied,  then  the 

feasibility  constraints  are  checked  using  an  active  set  strategy. 

r  v  T  r 

Optimality  cuts  are  extended-cuts  if  (x  -  x  )  8m+1(x  )  i.  0*  and 
super-cuts  using  the  centerpoint  gradient  if  not.  Feasibility 
cuts  are  Kelley-cuts  using  the  centerpoint  gradient. 

Some  of  the  individual  components  of  this  strategy  might  not 
be  expected  to  combine  their  improvements  in  an  additive  manner. 
For  example,  the  efficiency  gains  shown  here  for  both  the  active 
set  strategy  and  the  record-first  strategy  seem  to  come  from 
reducing  the  feasibility-checking  effort  of  Table  4.3.  We  note 
that  on  Colville  4,  which  has  no  active  feasibility  constraints, 
the  active  set  strategy  saves  about  33%  of  the  cyclical  effort,  and 
the  record-first  strategy  saves  about  35%  of  the  same  effort.  Thus 
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both  strategies  appear  to  have  saved  almost  all  of  the  39%  of 
effort  which  the  cyclical  strategy  expended  on  feasibility 
constraint  examinations.  Colville  8  and  Colville  1  were  the 
problems  on  which  both  of  these  variants  had  the  next  best 
efficiencies . 

On  the  other  hand,  some  of  the  individual  components  of  this 
proposed  strategy  may  well  have  a  synergistic  effect  when  used 
together.  For  example,  we  saw  what  a  marked  efficiency  improvement 
super-  and  extended-cuts  achieved  with  a  very  low  utilization 
frequency.  The  record-first  constraint  examination  strategy 
classifies  more  of  the  centerpoints  as  being  in  G',  so  more  super- 
and  extended-cuts  cuts  will  be  made. 

The  significance  of  this  is  in  the  efficiency  improvement 
which  is  possible  in  each  case.  Ve  demonstrated  that  the  active  set 
strategy  achieved  almost  all  of  the  possible  efficiency  improvement 
that  it  could,  and  that  its  improvements  were  linear  in  m+/m.  On 
the  other  hand,  doubling  the  percentage  of  iterations  on  which 
deep  cuts  are  made  results  in  squaring  the  volume  reduction 
relative  to  center  cuts. 

In  addition  to  suggesting  algorithms  which  combine  the  five 
efficiency-improving  variants,  there  are  several  other  extensions 


of  the  present  work  which  deserve  further  investigation. 


First,  the  success  of  the  active  set  strategy  presented  here 
suggests  that  it  might  be  worthwhile  to  study  whether  comparable 
improvements  in  efficiency  could  be  obtained  with  a  simpler  version 
in  which  every  maybe-record  point  is  checked  for  S-feasibility . 

Second,  we  have  treated  the  constraints  in  NLP  as  general 
nonlinear  constraints.  The  active  set  strategy  could  be  slightly 
improved  if  some  constraints  were  linear,  since  it  can  be 
analytically  determined  whether  a  linear  constraint  is  inactive 
over  E^.  If  the  functions  f^  are  linear  for  i  =  l...m+l  (i.e., 
if  NLP  is  a  linear  programming  problem)  then  when  the  active  set 

has  been  identified  the  corresponding  system  of  linear  equations 

* 

could  be  solved  to  find  x  . 

Finally,  the  record-first  strategy  developed  here  uses  the 
record  value  to  impose  a  constraint  on  the  objective  function 
value,  and  tests  this  constraint  before  the  feasibility 
constraints.  Ve  used  the  record  value  to  impose  the  objective 
function  constraint  since  our  formulation  of  NLP  assumed  that  no 
other  knowledge  of  fm+^(x  ^  *s  known.  On  some  problems,  a 
reasonably  tight  upper  bound  on  f  ^(x  )  can  be  predetermined 
(perhaps  from  a  physical  problem  being  modeled).  If  the  upper  bound 
is  f°,  then  our  record-first  strategy  could  be  generalized  into 
an  objective-first  strategy  where  the  objective  function  constraint 
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Appendix  A.  An  Adaptive  Hybrid  of  Regula  Falsi  and  Bisection 


Let  x(a)  =  x  +  ad  be  the  left  endpoint  of  a  line  segment, 
with  f(x(a))  >  0.  Similarly,  let  x(b)  =  x^  +  bd  be  the  right 
endpoint  of  the  line  segment,  with  f(x(b))  <  0.  Ve  want  to  find  A 
such  that  x(X)  is  an  approximation  to  the  zero  of  the  function. 


The  bisection  method  uses  the  approximation 


^  =  a  +  (b  -  a)/2. 


while  the  regula  falsi  method  uses  the  approximation 


Ar  =  a  +  f (x(a) ) (b  -  a)/(f(x(a>)  -  f(x(b))). 


Ve  use  a  hybrid  of  these  two,  attempting  to  achieve  the  faster 
convergence  of  regula  falsi  on  well-behaved  functions,  while 
retaining  the  faster  convergence  of  the  bisection  method  on  less 
well-behaved  functions. 


The  approximation  we  use  is  a  convex  combination  of  the  two 
approximations  above 


A  ■  aX  +  (1  -  o)A^ 


where  or  is  the  robustness  parameter.  If  the  robustness  parameter 
is  at  its  maximum  value  of  1,  then  the  regula  falsi  approximation 
is  used;  at  the  minimum  value  of  0,  the  bisection  approximation  is 
used.  The  initial  value  of  the  robustness  parameter  is  an  input  to 
the  subroutine;  we  set  it  at  1. 

The  algorithm  compares  its  actual  convergence  with  that  of  the 
bisection  method,  and  uses  an  adaptation  parameter  t  to  control  the 
value  of  a.  If  on  an  iteration  the  algorithm  reduces  the  interval 
of  uncertainty  by  at  least  half,  the  algorithm  increases  the 
robustness  parameter  toward  1.  Conversely,  on  iterations  when  the 
interval  of  uncertainty  is  not  halved,  a  is  decreased  toward  0. 
The  new  value  of  a  is  a  convex  combination  of  the  present  value  and 
the  desired  endpoint  value.  The  control  algorithm  is 

1.  Initialize  a  and  b. 

2.  Let  8  =  b  -  a . 

3.  Perform  an  iteration,  and  update  a  and  b. 

4.  If  (b  -  a)/e  >  .5,  let  a  =r0  +  (1  -  r )a  =  (1  -  x )o. 

Otherwise,  let  a  =  rl  +  (1  -  x)a  =  x  +  (1  -  r)o. 

5.  Go  to  2. 

The  adaptation  parameter  is  also  an  input  to  the  subroutine; 
we  use  a  value  of  0.5.  A  further  advantage  to  the  hybrid  algorithm 
is  that  it  also  can  be  used  as  a  simple  regula  falsi  or  bisection 
algorithm,  if  the  adaptation  parameter  is  set  to  0,  and  the 
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|  Appendix  B.  Probability  Analysis  for  Drop-check  Intervals 

\  Assume  that  we  are  starting  a  new  drop-check  interval  and  that 

the  nnmber  of  constraints  in  I+  is  m+.  Let  us  temporarily  assume 
that  m+  >  1.  Each  upcoming  Phase  1  iteration  will  increment  the 
violation  count  for  one  constraint  in  I+.  To  calculate  the  minimum 
length  of  the  drop-check  interval,  we  model  the  probability 
distribution  of  violated  constraints  as  a  multinomial  distribution 
with  an  equal  probability  that  each  truly  active  constraint  is 
found  to  be  violated  on  a  Phase  1  iteration. 

Ve  define  success  of  the  drop-check  to  be  when  all  truly 
+  + 

active  constraints  in  I  are  kept  in  I  ;  that  is,  no  active 

constraint  is  dropped  from  I+  because  it  had  a  zero  violation  count 

when  checked.  Under  the  hypothesis  that  all  m+  constraints  in  I+ 

are  truly  active,  we  calculate  the  drop-check  interval  as  the 

number  of  Phase  1  iterations  required  to  achieve  at  least  a  0.99 
probability  of  a  successful  drop-check.  After  k  =  Ak  Phase  1 
iterations  in  this  drop-check  interval,  let  v  be  the  m+-dimensional 
vector  of  violation  counts  for  the  constraints  in  I*.  A  drop-check 
is  successful  if  and  only  if  all  elements  of  v  are  positive. 
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The  probability  of  success  is  then  the  ratio  of  the  number  of 
permutations  of  v  where  all  m+  elements  are  positive,  to  the  number 
of  permutations  of  v.  That  is 


Pr  {Success}  = 


U...I  - - - 

v,  v-  v  +  (v,  ! )  (v.  ! ) . . .  (v  +!) 

12  m  12  m 


where :  v .  €  {1. . . } 

i 


for  i  =  1 . 


+ 

.m 


+ 


This  probability  is  more  easily  calculated  in  a  recursive 
manner.  Let  us  define  p(k,m+)  as  the  above  probability  of 
success.  The  event  of  failure  for  this  drop-check  can  be 
partitioned  into  failure  when  only  one  element  of  v  is  positive, 
failure  when  exactly  two  elements  of  v  are  positive,  and  so  on  up 
to  failure  when  exactly  (m  -  1)  elements  of  v  are  positive.  The 
probability  that  exactly  j  elements  of  v  axe  positive  is  the  number 
of  permutations  of  v  having  exactly  j  positive  elements  divided  by 
the  number  of  permutations  of  v.  The  number  of  permutations  of  v 
having  exactly  j  positive  elements  is  the  number  of  ways  of 
choosing  different  sets  of  j  positive  elements  from  the  m+ 


available,  times  the  number  of  permutations  of  j  elements  where  all 
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j  elements  are  positive.  Recall  that  p(k,j)  is  the  number  of 
permutations  of  j  elements  where  all  j  elements  are  positive, 
divided  by  all  permutations  of  j  elements.  Ve  therefore  have 


p(k,m+)  =  Pr 


{Success} 


=  1  -  Pr  {Failure} 


+ 

m 


-  1 


Pr  {Failure  with  exactly  j 


®+\  jk  P(k,j) 

j  /  (m+)k 


positive} 


We  define  p(k,l)  =  1  since  that  single  constraint  surely  has  all  k 
violation  counts  and  thus  it  can  not  be  dropped  erroneously. 


Define  k(m+)  as  the  least  k  such  that  p(k,m+)  2.  0*99.  The 
results  of  these  recursive  calculations  for  m+  =  2... 50  showed  the 
relationship  between  m+  and  k(m+)  plotted  in  Figure  B.l.  To  avoid 
having  to  store  a  tabulation  of  the  exact  results,  we  approximated 


the  relationship  with  a  quadratic  function.  The  approximation  used 
+  +2  + 

was  k'(m  )  =  Pj(m  )  +  p2m  +  p^.  Using  the  EA,  we  solved  the 

nonlinear  program 

50 

min  Ji.  [k'(m+)  -  k(m+)]^ 
m  =  2 

subject  to  k'(m+)  2.  k(m+),  m+  =  2... 50 

and  found  the  optimal  point  to  be: 

P;1  *  +0.02  4  93  3  4  3  7  0  53  5  5  9  95 
p2  =  +7.385572218142679 
p3  =  -6.869633070032582 

The  approximating  function  is  also  shown  in  Figure  B.l. 

The  discussion  above  assumed  that  m  >1,  and  did  not  specify 
the  effect  of  Phase  2  iterations.  If  m+  =  0  then  there  is  no 
active  set  from  which  to  drop  constraints,  and  drop-checks  are  not 
used. 

If  the  proportion  of  Phase  2  iterations  during  a  drop-interval 


is  small,  we 

ignore  these 

iterations  as 

not  contributing 

any 

information 

about 

which 

constraints  are 

active.  However, 

if  a 

sufficiently 

long 

interval 

contains  only  Phase  2  iterations. 

the 

algorithm 

should 

eventually  drop  all 

constraints  in  I* 

as 

T 


Appendix  C.  Analysis  for  Add-check  Intervals 

Again,  let  m+  be  the  number  of  constraints  in  I*.  If  m+  =  m, 
then  all  record  points  found  are  true  record  points,  and  the 
active  set  strategy  does  not  need  to  perform  add-checks  or  save 
ellipsoid  data  for  possible  backtracks.  The  discussion  below 
considers  m+  <  m. 

The  initial  add-check  interval  is  taken  as  Ak+  =  1,  meaning 
that  every  maybe— record  point  is  tested  to  see  if  it  is  also 
I  -feasible.  When  we  feel  that  I  also  has  constraints  that  might 
be  violated,  this  value  is  appropriate  since  all  cuts  are  then 
solution-preserving.  However,  as  I+  grows  more  stable,  we  would 
like  the  algorithm  to  proceed  toward  a  computationally  more 
efficient  interval.  There  is  a  tradeoff  in  the  computational 
factors  involved. 

Lengthening  Ak+  decreases  the  effort  spent  checking  I  .  The 
amount  of  effort  saved  varies  with  the  number  of  constraints  in  I 
versus  I*.  When  m+  =  m,  there  is  no  effort  associated  with 
checking  I  ,  and  the  add-check  interval  should  not  grow.  That  is, 
a  multiplicative  growth  factor  of  1  should  always  be  used  when 
m+  m  m.  When  m+  ■  0,  an  add-check  is  highest  in  effort  since 
checking  I  requires  all  m  constraint  functions  to  be  evaluated. 
To  minimize  this  high  cost,  we  would  permit  the  growth  factor 


L5 


to  approach  the  upper  bound  of  2  when  m+  =  0. 

The  other  computational  effort  to  consider  when  lengthening 

Ak+  is  the  penalty  incurred  performing  iterations  after  a 

backtrack.  The  effort  required  for  the  EA  update  formulae  is  of 
2 

order  n  .  Foi  these  reasons,  we  use  a  growth  factor  which  is  only 

permitted  to  double  the  add-check  interval  when  n  =  1,  and 

2 

decreases  toward  1  with  n  . 

We  set  r  =  m  /m  and  define  the  growth  factor  u  as 

u  =  r(l)  +  (1  -  r) (1  +  n“2). 

This  yields  the  desired  values  at  the  endpoints,  u  =  1  and  u  =  2. 
An  adaptive  method  is  used  which  adjusts  intermediate  values  of  u 
closer  to  1  or  2  as  appropriate,  to  control  algorithm  behavior. 
The  mapping  uses  a  control  factor  z,  and  produces  an  adjusted 
growth  factor  w  by 


w  =  =  1  +  (u  -  l)(z  "  1)7(2  "  z) 

A  value  z  =  1.5  causes  w  =  z;  z  =  1  causes  w  =  1;  and  z  =  2  is 
defined  as  w  =  2  unless  u  =  1  (where  we  use  w  =  1  to  ensure  no 


growth  when  r  =  1) 
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Figure  D1.3  Col  3:  G'  Deep  Cuts  Uithout  Linesearch 
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Figure  D1.5  Col  8:  G’  Deep  Cuts  Uithout  Linesearch 
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Figure  D1.6  Den  1;  G*  Deep  Cuts  Uithout  Linesearch 


Figure  D1.7  Den  2:  G’  Deep  Cuts  Uithout  Linesearch 
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Figure  01.8  Dem  3:  G*  Deep  Cuts  Uithout  Llnesearch 
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Figure  D1.10  Den  5:  G*  Deep  Cuts  Uithout  Linesearch 
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Figure  D1.11  Den  6:  G*  Deep  Cuts  Uithout  Linesearch 
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Figure  D1.15  Col  2:  S'  Deep  Cuts  Uithout  Linesearch 
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Figure  01.18  Col  8:  S’  Deep  Cuts  Uithout  Linesearch 
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Figure  01.19  Dem  1:  S’  Deep  Cuts  Uithout  Linesearch 
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Figure  01.22  Den  4:  S'  Deep  Cuts  Uithout  Linesearch 


Error  vs  Effort 

Dembo  5 
Sun  Jul  10/83 


fl D-A122  599  ELLIPSOID  ALGORITHM  VARIANTS  IN  NONLINEAR  PROGRAMING 

(U)  AIR  FORCE  INST  OF  TECH  WRIGHT-PATTERSON  AFB  OH 
S  T  DZIUBAN  AUG  82  AFIT/CI/NR-82-26D 


UNCLASSIFIED 


F/G  12/1  NL 


MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BUREAU  Of  STANDARDS- 1963-A 


Figure  01.26  Dei  8:  S'  Deep  Cuts  Without  Linesearch 


183 


Log  Relative  Combined  Solution  Error 

,-20.00  -16.00  -12.00  -8.00  -4.00  0.00  4.00  8.00 


Total  PSCPU  Time  (sec) 


Figure  02.2  Col  4:  lliniiization  at  0.1 
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Figure  02.4  Dei  8:  niniiization  at  0.1 
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Figure  02.6  Col  4:  Minimization  at  0.01 
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Figure  D2.L  Dt..  8:  ninimization  at  0.01 
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Figure  02.11  Den  4:  Minimization  at  0.001 


Error  vs  Effort 

Deabo  8a 
Ued  Jul  06/83 


Muller  ain  .001 


■In  .001 


Error  vs  Effort 

Colville  1 
Ued  Jul  06/83 


riuller  zero  .1 
Bisection  zero  .1 


g  Relative  Combined  Solution  Error 


Error  vs  Effort 

Deabo  4a 
Ued  Jui  06/83 


Muller  zero  .1 
Bisection  zero  .1 
Regula  zero  .1 


Figure  02.16  De>  8:  Zero-finding  at  0.1 
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Figure  02.20  De«  8:  Zero-finding  at  0.01 
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Figure  D3.4  Col  4:  Deep  Cuts  Using  Linesearches 
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Figure  D3.10  Den  5:  Deep  Cuts  Using  Linesearches 
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Figure  D3.12  Dem  7:  Deep  Cuts  Using  Linesearches 
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Figure  D4.3  Col  3:  Three  Sliple  Strategies 
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Figure  04.7  De»  2;  Three  Simpie  Strategies 
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Figure  D4.9  Dea  4:  Three  Simple  Strategies 
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Figure  D4.11  Dea  6:  Three  Simple  Strategies 
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Figure  04.12  Deni  7:  Three  Simple  Strategies 
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Figure  D5.5  Col  8:  Active  Set  Strategy 
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Figure  D5.10  Oem  5:  Active  Set  Strategy 
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Figure  D5.11  Dea  6:  Active  Set  Strategy 
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Figure  05.13  Den  8:  Active  Set  Strategy 
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Figure  D6.3  Col  3;  Record-first  Strategy 
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Figure  D6.6  De»  1:  Record-first  Strategy 
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Figure  D6.9  Oe«  4?  Record-first  Strategy 
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Figure  06.11  De«  6:  Record-first  Strategy 
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